Proj CDeepFuzz Paper Reading: NYX: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types

Abstract

背景:hypervisor(virtual machine monitor, VMM) 保障了不同虚拟机之间的安全隔离(security boundaries)
用户:攻击场景:在云服务上运行自身的VM instances, 提升权限

本文:Nyx
目的:coverage guided hypervisor fuzzer
method: 1. fast snapshot restoration mechanism 2. mutation based on DAG 3. affine types to express the complex interactions

效果:

  1. 在简单targets上需要比其他hypervisor fuzzers更长(长几分钟)的时间
  2. 在复杂设备上outperform
  3. 44 bugs, +22 CVEs

1. Intro

P1: Challenges of fuzzing hypervisors

  1. low-level hardware details
  2. highly privileged settings

P2: related word: VDF, Hyper-Cube
VDF: isolating QEMU device drivers into harnesses
HyperCube: blind fuzzer
现象:由于VDF太慢了,所以HyperCube尽管没有利用到feedback,仍然在绝大多数机器上速度胜出

P3: 为不同的emulators写harness仍然需要人力,且增加了假阳和假阴的可能性, one has to be very careful to reproduce the original environment of the device emulator faithfully.
介绍了CGF

P4: 本文: Nyx
效果:在复杂Devices上效果比HyperCube更好,尽管test throughput严重下降

P5: 本文方案挑战:1. 没有人工编写的harness 2. 无法为全部相关component插桩

P6: run the target component on our own hypervisor
结构: host OS->host hypervisor->target OS->target hypervisor->agent OS
共3个不同的OS和2层hypervisors

P7:

  1. 使用kAFL,用Intel PT来收集code coverage
  2. 使用修改版的Hyper Cube Custom OS作为Agent OS
  3. Nyx Fuzzer:
  4. fast snapshot restoration mechanism,用来处理Statefulness, and non-determinism of this complex stack
  5. new mutation engine based on DAG and user-provided specification, 描述bytecode语义和shape of the graph produced
  6. affine types: ensure each value is used at most once, 用来管理资源的恰当释放
  7. 人工为一些复杂的devices提供了specifications

P8: evaluation

2. Technical Background

2.1 x86 Hypervisors

Hypervisors(Virtual Machine Monitors)
Guest(Virtual Machine)
使用特定的CPU指令或者access protection schemes来隔离各VM
"generally speaking, emulated hardware is provided by the hypervisor"
"real hardware that cannot be emulated easily can be passed through"

2.2 Trap-VM-Exit and Paravirtualization

VM触发VM-Exit Trap,通过VM-Exit transition, hypervisor能够调用对应device emulator, 从而模拟priviledged operation,执行后返回控制权给VM

devices分为两类:

  1. Memory-Mapped I/O(MMIO): hypervisor set a trap condition for the entire MMIO region
  2. port I/O(PIO): 需要通过in和out指令,hypervisor设置CPU在in/out指令上进入trap

为了加速,许多现代hypervisors包含paravirtualized interfaces来避免昂贵的context switch(Q: emulate hardware that does not have physical pendants but reduce communication overhead)
Paravirtualization requires the guest operating system to be explicitly ported for the para-API

2.3 Challenges for Fuzzing Hypervisors

  1. individual device emulator的harness提取很难,而且闭源的hypervisors很难这样做
  2. hypervisors非常stateful,很难区分单个test cases
  3. hypervisor需要许多不同的interface,很多都需要complex highly structures,有些需要在特定的内存区域创建

2.3.1 Code Coverage and Handling Crashes

用虚拟化创建isolated, externally controlled environment
基于kAFL的工具,如Redqueen, GRIMOIRE等,都用KVM-PT来跟踪VM中运行的代码, 用QEMU-PT来解码traces并获取coverage information
使用VM,能更好地handle crashes of complex components

Nested Virtualization: L0->L1 guest->L2 guests...
Q: 现有的x86 visualization extension不支持这种嵌套,因此,这种嵌套需要通过软件实现

在类似KVM等hypervisor中,nested virtualization通过模拟来实现:trapes all VMX instructions and emulates them at L0
例如:L2发出IO请求,进入system trap, L0必须先handle the trap,然后再将对应理由和参数发送给L1,trap the VM re-entry at L1,再交换给L2控制权

2.3.2 Fuzzing Stateful Applications

State: 1. disk file 2. time,例如timer和使用时间的hash表,使用时间的hash也造成了一些code coverage collision
而因为hypervisor的context中就有timer,因此,控制hypervisor的full state就能很好地重现test cases执行。
VDF等通过隔一段时间就彻底重启整个process来完成状态重置,这很明显会带来精度下降

Listing 1: Example demonstrating lifetime constraints for interactive targets.

obj = malloc_obj();
//use only after it was created
use(&obj)
//obj must not be used after free
free(obj);

2.3.3 Fuzzing Interactive Interfaces

Part I: 挑战:不同接口和资源状态下接受不同格式;状态累计可能导致资源变化,比如关掉了某个设备
Part II: 为什么用DAG: 已有context grammar free多用树型表达,这很难检测如增删改查等操作,对时间相关属性操作也不友好

2.4 Affine Types

P1: 为了确保符合library's contract,使用Affine type(a class of type systems, 确保每个值最多使用一次,避免重用)
P2: affine types allows to express reuse constraints with a focus on versatility,用户需要指定一个opcodes集合,每个opcode代表一个function,可以使用任意数量的arguments,返回任意数量的返回值
arguments既能被直接消耗,也可以仅仅是borrowed
参考Rust
效果:找bug效率更高

3 Design

3.1 Threat Model

Attacker can run its own kernel

3.2 Architecture Overview

挑战:

  1. explore complex interfaces with multiple back and forth interactions
  2. maintain a deterministic and controlled env

basic architecture: VMI(virtual machine introspection based fuzzer) + HyperCube based agent OS

3.3 High Performance, Coverage-Guided Fuzzing

选择binary-only,不选择插桩,以此:

  1. 避免不同build systems、编译器等复杂性
  2. test the real software as it is delivered, with original compiler flags and patch sets

Stable and Deterministic Fuzzing:

  1. 为了解决gracefully recover from codes: run the target software inside a KVM
  2. 为了解决noisy coverage traces results: Extend QEMU-PT和KVM-PT来获得very fast VM reload operations, 以此保存全部状态,包括timing interrupts and clocks
  • 使用Page Modification Loggin(PML),KVM能够找到那些需要reset的page frames(dirty frames?)
    • maintain a full copy of the original state and an additional dirty page tracker
  • 绕过QEMU-PT的device loading code
  • 使用hypercalls沟通Fuzzer与host hypervisor(KVM-PT)

Communication with Nested Virtualization:
为了直接让L2中的agent与KVM-PT通信,本文实现了特殊hypercalls以及对应的hanlders,并会将hypercalls发送给KVM-PT,然后再根据需要发还给L1

Pass the input: Using a section of shared memory

3.4 Generic Fuzzing of Interactive Targets

bytecode-like specifications are much more useful, as they allow to properly refer to existing and initialized variables(FUZZILI, SYZKALLER)
本文使用的specification format允许types and temporal usage patterns

Affine Typed Specification Engine: 支持temporal create/use/delete/do-not-reuse constraints
Methods:

  1. build a formalism that can be used to describe strongly typed bytecodes(Q: Strongly typed bytecode是机器码?还是自定义的某种byte code?)
  2. a custom compiler to generate C code from those specifications, 本文特别设计,使得这些c code可以被嵌入到任何target中
  3. Each input is a DAG, each node is a function, the edge is a typed value as either a value(shouldn't be used again) or a reference. Any node(function) 参数的数量任意,返回值的数量也任意。此外,每个function还会有个additional data argument that can contain arbitrary tree-shaped data structures(Q)

Example 1. In this case we consider 3 opcodes: open, write, and close. The first opcode open(data: Vec) -> File has no moved or ref arguments. It only consumes a path (data string) and produces a file object. The second opcode, write(file: &File, data: Vec) takes a reference to a file object and again some data that will be written and returns no value. Any number of such write opcodes can reuse the same File object. The last opcode close(file: File) consumes the File object, and no further operations are possible on the file.

Q: 为什么Listing 1会被翻译为Figure 2?与"foo"有什么关系?是固定了一些参数吗?bytecode specification就是指这张图?

Q: Figure2 中的bytecode specification会生成input graphs,这些input graph会被保存在a very compact serialized format中,然后放在shared memory中,从而避免无意义的拷贝,也不花费额外代价去分配空间给新生成的graphs

本文自动将用户输入的specification编译为C header file,其中包含 bytecode interpreter。
用户需要为每个node的行为都给一个C代码实现
用户可以用arbitrary C types as edge types

Q: The target component parses the graph (这里target component是Target Hypervisor?还是Spec-Compiler?等等)
Q: As the tree-shaped data needs to be mutated, the fuzzer needs to be aware of the structure and thus, they need to be described in the specification.
Q: The fuzzer does not need to modify or use the values that are created in the edges.

3.5 Applications beyond Hypervisor Fuzzing

fuzzing hypervisors, operating system and ring-3 applications in a unified framework.
使用了一些SYZKALLER的specification
build a harness that allows to explore the impact of fuzzing env variables, cmd args, STDIN, multiple files as inputs

4 Implementation Details

Main efforts:

  1. getting coverage information
  2. fast snapshot reloads
  3. facilitate communication between the agent and the fuzzer

4.1 Backend Implementation

Main functions:

  1. measure the coverage
  2. handle misbehaving targets
  3. provide communication channels

based on: QEMU-PT, KVM-PT

4.1.1 Fast Coverage

4.1.2 Fast snapshot Reloads

4.1.3 Nested Hypervisor Communication

4.2 Fuzzing Frontend for Affine Typed Bytecode Programs

4.2.1 Representation of Bytecode

4.2.2 Generating Bytecode Interpreters

posted @ 2023-10-04 22:57  雪溯  阅读(95)  评论(0编辑  收藏  举报