qemu的外部快照实现原理

一基础概念

1 外部快照
当一个快照被创建时，创建时当前的状态保存在当前使用的磁盘文件中，即成为一个backing file。此时一个新的overlay被创建出来保存往后的数据。
2 backing file和overlay
对基础镜像做外部快照，生成的快照文件被称为overlay，基础镜像成为backing file。backing file是只读的。示例链路如下:
base-img <-- overlay-1 <-- overlay-2 <-- overlay-3
仅支持base-img为raw file，其他节点为qcow2 file的链路，或者所有节点为qcow2 file的链路。
3 更多名词解释

cluster: qcow2镜像存储数据的最小单位，默认64K。
L2 entry: 描述cluster offset及其相关标志位的入口，长度为64位。
cluster offset: normal cluster位于镜像中的起始地址，长度为64位。
cluster type: cluster类型，根据存储内容分为normal，compressed，unallocated，zeroplain， zeroalloc。

二 COW

COW是外部快照的读写流程中最重要的部分。
1 什么是COW
COW(Copy-On-Write)，也被称之为「即写即拷」快照技术或「写时复制」快照技术，这种方式通常也被称为“元数据(源数据指针表)”拷贝。顾名思义，如果有人试图改写源数据块上的原始数据，首先将原始数据拷贝到新数据块中，然后再进行改写。
2 为何需要COW
当写请求发生在backing file上，要先分配new cluster。如果只是把要写的数据写到new cluster上，那数据是不完整的，因为每次写入的数据并非刚刚是cluster的整数倍，可能是几个或几个sector，所以需要我们把剩下的部分从old cluster上复制到new cluster，最后把这个cluster标记为allocated cluster。
当我们下次读该cluster，在overlay上就可以读到，且数据是完整的，不是残缺不全的。
3 如何COW
qcow2的COW并非先复制再写入，而是先写入再复制。步骤如下:

分配new cluster。
标记写时复制的范围cow_start和cow_end，保存在结构体QCowL2Meta。
写入要写的数据到new cluster。
把剩下的部分从old cluster复制到new cluster。
用新的cluster offset更新l2 table，加上QCOW_OFLAG_COPIED标记。

三 cluster类型

1 cluster类型

typedef enum QCow2ClusterType {
    QCOW2_CLUSTER_UNALLOCATED, //未分配
    QCOW2_CLUSTER_ZERO_PLAIN,
    QCOW2_CLUSTER_ZERO_ALLOC,
    QCOW2_CLUSTER_NORMAL, //normal
    QCOW2_CLUSTER_COMPRESSED, //压缩
} QCow2ClusterType;

当QCOW2_CLUSTER_NORMAL类型，且标记QCOW_OFLAG_COPIED，才无需分配cluster。其他情况均需要分配cluster。
2 如何获取cluster类型
qcow2_get_cluster_type获取cluster类型。

四分配cluster

谁来决定分配cluster呢，是由l2 entry决定的，这里要搞清楚它的含义。
1 l2 entry各位含义，以standard clusters为例。

12 entry
+----+----+
|  0 |    | ->  如果为1，则该cluster读为0。
|----|----|
| 1-8 |   | ->  保留，默认为0。
|----|----|
| 9-55 |  | ->  host cluster offset，必须对齐cluster boundary。
|         |     如果为0，且63bit为0，为unallocated cluster，需分配新cluster，
|         |     必须cow。
|         |     如果为0，且63bit为1，且用了external data file，为normal cluster。
|         |     如果不为0，为normal cluster。
|----|----|
| 56-61 |  | ->  保留，默认为0。
|----|----|
| 62 |    | ->  0 for standard clusters  
|         |     1 for compressed clusters
|----|----|
| 63 |    | ->  0 表示unused, compressed or require COW.  
|         |     对于standard clusters，1表示refcount=1。  
|         |     如果用到了external data file，为1。
+----+----+

2 根据l2 entry决定是否分配cluster

l2 entry满足以下条件，为normal cluster，不分配新cluster。

               63bit   62bit   ...    0bit
               +-----+-----+-----+-----+
12 entry       |  1  |  0  |  !=0 |  0  |
(cluster offset)+-----+-----+-----+-----+
                 |     |             |
                 v     v             v
       QCOW_OFLAG_COPIED
             QCOW2_CLUSTER_COMPRESSED
                                 QCOW_OFLAG_ZERO

l2 entry的每一位都为0，为unallocated cluster，需分配新cluster。

               63bit   62bit   ...    0bit
               +-----+-----+---------+-----+
12 entry       |  0  |  0  |  ==0    |  0  |
(cluster offset)+-----+-----+---------+-----+
                 |     |                |
                 v     v                v
       QCOW_OFLAG_COPIED
             QCOW2_CLUSTER_COMPRESSED
                                 QCOW_OFLAG_ZERO

满足63bit=0，且l2 entry & L2E_OFFSET_MASK != 0。该cluster被内部快照过，需分配cluster.

               63bit   62bit   ...    0bit
               +-----+-----+-----+-----+
12 entry       |  0  |  0  |  !=0 |  0  |
(cluster offset)+-----+-----+-----+-----+
                 |     |             |
                 v     v             v
       QCOW_OFLAG_COPIED
             QCOW2_CLUSTER_COMPRESSED
                                 QCOW_OFLAG_ZERO

五读写流程

1 读流程
发生在qcow2_co_preadv中，分为两种情况：

要读的数据在overlay上。
要读的数据在backing file上。

qcow2_co_preadv
	ret = qcow2_get_cluster_offset //计算cluster offset，并返回cluster type。
	switch (ret)
	case QCOW2_CLUSTER_UNALLOCATED:
		if (bs->backing) //读取backing file上的数据。
			bdrv_co_preadv
	case QCOW2_CLUSTER_NORMAL:
		bdrv_co_preadv //读取overlay上的数据。

2 写流程
发生在qcow2_co_pwritev中，流程如下：

如果不需要分配new cluster，将数据写入overlay。
如果需要分配new cluster，但是没有backing file，或者backing file上找不到cluster offset，将数据写入new cluster。
如果需要分配new cluster，且在backing file能找到cluster offset，执行COW。

qcow2_co_pwritev
	qcow2_alloc_cluster_offset //获取cluster offset，如果获取失败，分配new cluster。
	handle_alloc_space //new cluster填充零。
	bdrv_co_pwritev //将数据写入cluster。
	qcow2_handle_l2meta //当需要COW时，该函数执行COW。

posted @ 2025-09-12 17:22 dogonthemoon 阅读(246) 评论(0) 收藏举报

刷新页面返回顶部

codeisland

qemu的外部快照实现原理

一基础概念

二 COW

三 cluster类型

四分配cluster

五读写流程

公告

codeisland

qemu的外部快照实现原理

一 基础概念

二 COW

三 cluster类型

四 分配cluster

五 读写流程

公告

一基础概念

四分配cluster

五读写流程