FFmpeg数据结构AVFrame

本文为作者原创，转载请注明出处：https://www.cnblogs.com/leisure_chn/p/10404502.html

本文基于 FFmpeg 8.0 版本。

1. 数据结构定义

1.1 相关基础概念

在阅读 AVFrame 数据结构之前，需要先了解与之相关的几个基础概念 (此处只考虑视频相关)：

pixel_format：像素格式，图像像素在内存中的排列格式。一种像素格式包含色彩空间、采样方式、存储模式、位深等信息，其中最重要的信息就是存储模式，可参照本文第 2 节中有关存储模式的说明。

bit_depth：位深，指每个分量 (Y、U、V、R、G、B 等) 单个采样点所占的位宽度。

例如对于 yuv420p (位深是 8) 格式而言，每一个 Y 样本、U 样本和 V 样本都是 8 位的宽度，只不过在水平方向和垂直方向，U 样本数目和 V 样本数目都只有 Y 样本数目的一半。而 bpp (Bits Per Pixel) 则是将图像总比特数分摊到每个像素上，计算出平均每个像素占多少个 bit，例如 yuv420p 的 bpp 是 12，表示平均每个像素占 12 bit (Y 占 8 位、U 占 2 位、V 占 2 位)，实际每个 U 样本和 V 样本都是 8 位宽度而不是 2 位宽度。

plane：存储图像中一个或多个分量的一片内存区域。一个 plane 包含一个或多个分量。planar 存储模式中，至少有一个分量占用单独的一个 plane，具体到 yuv420p 格式有 Y、U、V 三个 plane，nv12 格式有 Y、UV 两个 plane，gbrap 格式有 G、B、R、A 四个 plane。packed 存储模式中，因为所有分量的像素是交织存放的，所以 packed 存储模式只有一个 plane。

slice：slice 是 FFmpeg 中使用的一个内部结构，在 codec、filter 中常有涉及，通常指图像中一片连续的行，表示将一帧图像分成多个片段。注意 slice 是针对图像分片，而不是针对 plane 分片，一帧图像有多个 plane，一个 slice 里同样包含多个 plane。

stride/pitch：一个 plane 中一行数据的宽度。有对齐要求，计算公式如下：

stride 值 = 图像宽度 / 水平子采样因子 * 分量数 * 单样本位宽度 / 8

其中，图像宽度的单位是像素点，如分辨率是 1280x720 的一幅图像，其宽为 1280 个像素点，高为 720 个像素点。水平子采样因子指在水平方向上每多少个像素采样出一个色度样本，亮度样本不进行下采样，所以采样因子总是 1。分量数指当前 plane 包含多少个分量，如 rgb24 格式一个 plane 有 R、G、B 三个分量。单样本位宽度指某分量的一个样本在考虑对齐后在内存中占用的实际位数，例如位深 8 占 8 位宽，位深 10 实际占 16 位宽，对齐值与平台相关。

上述概念的详细说明可参考“色彩空间与像素格式3-FFmpeg中的像素格式，https://www.cnblogs.com/leisure_chn/p/18610070”一文。

1.2 struct AVFrame

AVFrame 中存储的是经过解码后的原始数据。在解码中，AVFrame 是解码器的输出；在编码中，AVFrame 是编码器的输入。下图是 FFmpeg 转码流程，其中“decoded frames”的数据类型就是 AVFrame：

 _______              ______________
|       |            |              |
| input |  demuxer   | encoded data |   decoder
| file  | ---------> | packets      | -----+
|_______|            |______________|      |
                                           v
                                       _________
                                      |         |
                                      | decoded |
                                      | frames  |
                                      |_________|
 ________             ______________       |
|        |           |              |      |
| output | <-------- | encoded data | <----+
| file   |   muxer   | packets      |   encoder
|________|           |______________|

AVFrame 定义于 libavutil/frame.h 头文件中。AVFrame 数据结构非常重要，它的成员非常多，导致数据结构定义篇幅很长。下面引用的数据结构定义中省略冗长的注释以及大部分成员，先总体说明 AVFrame 的用法，然后再将一些重要成员摘录出来单独进行说明：

/**
 * This structure describes decoded (raw) audio or video data.
 *
 * AVFrame must be allocated using av_frame_alloc(). Note that this only
 * allocates the AVFrame itself, the buffers for the data must be managed
 * through other means (see below).
 * AVFrame must be freed with av_frame_free().
 *
 * AVFrame is typically allocated once and then reused multiple times to hold
 * different data (e.g. a single AVFrame to hold frames received from a
 * decoder). In such a case, av_frame_unref() will free any references held by
 * the frame and reset it to its original clean state before it
 * is reused again.
 *
 * The data described by an AVFrame is usually reference counted through the
 * AVBuffer API. The underlying buffer references are stored in AVFrame.buf /
 * AVFrame.extended_buf. An AVFrame is considered to be reference counted if at
 * least one reference is set, i.e. if AVFrame.buf[0] != NULL. In such a case,
 * every single data plane must be contained in one of the buffers in
 * AVFrame.buf or AVFrame.extended_buf.
 * There may be a single buffer for all the data, or one separate buffer for
 * each plane, or anything in between.
 *
 * sizeof(AVFrame) is not a part of the public ABI, so new fields may be added
 * to the end with a minor bump.
 *
 * Fields can be accessed through AVOptions, the name string used, matches the
 * C structure field name for fields accessible through AVOptions.
 */
typedef struct AVFrame {
#define AV_NUM_DATA_POINTERS 8
    uint8_t *data[AV_NUM_DATA_POINTERS];
    int linesize[AV_NUM_DATA_POINTERS];
    uint8_t **extended_data;
    int width, height;
    int nb_samples;
    int format;
    enum AVPictureType pict_type;
    AVRational sample_aspect_ratio;
    int64_t pts;
    ......
} AVFrame;

AVFrame 的用法：

AVFrame 对象必须调用 av_frame_alloc() 在堆上分配，注意此处指的是 AVFrame 对象本身，AVFrame 对象必须调用 av_frame_free() 进行销毁。
AVFrame 中包含的数据缓冲区是
AVFrame 通常只需分配一次，然后可以多次重用，每次重用前应调用 av_frame_unref() 将 frame 复位到原始的干净可用的状态。

下面将一些重要的成员摘录出来进行说明：
data

    /**
     * pointer to the picture/channel planes.
     * This might be different from the first allocated byte. For video,
     * it could even point to the end of the image data.
     *
     * All pointers in data and extended_data must point into one of the
     * AVBufferRef in buf or extended_buf.
     *
     * Some decoders access areas outside 0,0 - width,height, please
     * see avcodec_align_dimensions2(). Some filters and swscale can read
     * up to 16 bytes beyond the planes, if these filters are to be used,
     * then 16 extra bytes must be allocated.
     *
     * NOTE: Pointers not needed by the format MUST be set to NULL.
     *
     * @attention In case of video, the data[] pointers can point to the
     * end of image data in order to reverse line order, when used in
     * combination with negative values in the linesize[] array.
     */
    uint8_t *data[AV_NUM_DATA_POINTERS];

存储原始帧数据 (未编码的原始图像或音频，作为解码器的输出或编码器的输入)。

data 是一个指针数组，数组的每一个元素是一个指针，指向视频中图像的某一 plane 或音频中某一声道的 plane。

对于 packed 格式，一幅 YUV 图像的 Y、U、V 交织存储在一个 plane 中，形如 YUVYUV...，data[0] 指向这个 plane；一个双声道的音频帧其左声道 L、右声道 R 交织存储在一个 plane 中，形如 LRLRLR...，data[0] 指向这个 plane。

对于 planar 格式，一幅 YUV 图像有 Y、U、V 三个 plane，data[0] 指向 Y plane，data[1] 指向 U plane，data[2] 指向 V plane；一个双声道的音频帧有左声道 L 和右声道 R 两个 plane，data[0] 指向 L plane，data[1] 指向 R plane。

linesize

    /**
     * For video, a positive or negative value, which is typically indicating
     * the size in bytes of each picture line, but it can also be:
     * - the negative byte size of lines for vertical flipping
     *   (with data[n] pointing to the end of the data
     * - a positive or negative multiple of the byte size as for accessing
     *   even and odd fields of a frame (possibly flipped)
     *
     * For audio, only linesize[0] may be set. For planar audio, each channel
     * plane must be the same size.
     *
     * For video the linesizes should be multiples of the CPUs alignment
     * preference, this is 16 or 32 for modern desktop CPUs.
     * Some code requires such alignment other code can be slower without
     * correct alignment, for yet other it makes no difference.
     *
     * @note The linesize may be larger than the size of usable data -- there
     * may be extra padding present for performance reasons.
     *
     * @attention In case of video, line size values can be negative to achieve
     * a vertically inverted iteration over image lines.
     */
    int linesize[AV_NUM_DATA_POINTERS];

linesize 是一个数组。

对于视频来说，linesize 每个元素是一个图像 plane 中一行图像的大小 (字节数)。注意有对齐要求。此处的 linesize 就是 1.1 节中的 stride/pitch。对于 planar 格式视频，有多个 plane，每个 plane 的 linesize 表示一行图像在当前 plane 中所占的存储空间大小。对于 packed 格式视频，只有一个 plane，linesize 表示一行图像所占的存储空间大小。

对于音频来说，linesize 每个元素是一个音频 plane 的大小 (字节数)。packed 格式多声道音频只有一个 plane，planar 格式多声道音频有多个 plane。音频只使用 linesize[0]，即使有多个 plane。对于 planar 音频来说，每个 plane 的大小必须一样。

linesize 可能会因性能上的考虑而填充一些额外的数据，以满足特定的对齐要求，因此 linesize 可能比有效数据尺寸要大。

extended_data

    /**
     * pointers to the data planes/channels.
     *
     * For video, this should simply point to data[].
     *
     * For planar audio, each channel has a separate data pointer, and
     * linesize[0] contains the size of each channel buffer.
     * For packed audio, there is just one data pointer, and linesize[0]
     * contains the total size of the buffer for all channels.
     *
     * Note: Both data and extended_data should always be set in a valid frame,
     * but for planar audio with more channels that can fit in data,
     * extended_data must be used in order to access all channels.
     */
    uint8_t **extended_data;

对于视频来说，直接指向 data[]成员。

对于 packed 格式音频，多个声道的采样点交织共存在一个 plane 中；对于 planar 格式音频，多个声道有多个 plane，每个声道独占一个 plane，而 data[] 指针数组大小为 8，data[] 最多能存储 8 个声道的数据，当声道多于 8 时，多余的声道就要存储在 extended_data 中。

width, height

    /**
     * @name Video dimensions
     * Video frames only. The coded dimensions (in pixels) of the video frame,
     * i.e. the size of the rectangle that contains some well-defined values.
     *
     * @note The part of the frame intended for display/presentation is further
     * restricted by the @ref cropping "Cropping rectangle".
     * @{
     */
    int width, height;

视频帧宽和高 (像素)。

nb_samples

    /**
     * number of audio samples (per channel) described by this frame
     */
    int nb_samples;

音频帧中单个声道中包含的采样点数。

format

    /**
     * format of the frame, -1 if unknown or unset
     * Values correspond to enum AVPixelFormat for video frames,
     * enum AVSampleFormat for audio)
     */
    int format;

帧格式。如果是未知格式或未设置，则值为 -1。

对于视频帧，format 表示图像像素格式，对应“enum AVPixelFormat”类型：

enum AVPixelFormat {
    AV_PIX_FMT_NONE = -1,
    AV_PIX_FMT_YUV420P,   ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
    AV_PIX_FMT_YUYV422,   ///< packed YUV 4:2:2, 16bpp, Y0 Cb Y1 Cr
    AV_PIX_FMT_RGB24,     ///< packed RGB 8:8:8, 24bpp, RGBRGB...
    AV_PIX_FMT_BGR24,     ///< packed RGB 8:8:8, 24bpp, BGRBGR...
    ...
}

对于音频帧，format 表示音频采样格式，对应“enum AVSampleFormat”类型：

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

pict_type

    /**
     * Picture type of the frame.
     */
    enum AVPictureType pict_type;

视频帧类型 (I、B、P 等)。AVPictureType 类型定义如下：

enum AVPictureType {
    AV_PICTURE_TYPE_NONE = 0, ///< Undefined
    AV_PICTURE_TYPE_I,     ///< Intra
    AV_PICTURE_TYPE_P,     ///< Predicted
    AV_PICTURE_TYPE_B,     ///< Bi-dir predicted
    AV_PICTURE_TYPE_S,     ///< S(GMC)-VOP MPEG-4
    AV_PICTURE_TYPE_SI,    ///< Switching Intra
    AV_PICTURE_TYPE_SP,    ///< Switching Predicted
    AV_PICTURE_TYPE_BI,    ///< BI type
};

sample_aspect_ratio

    /**
     * Sample aspect ratio for the video frame, 0/1 if unknown/unspecified.
     */
    AVRational sample_aspect_ratio;

采样宽高比。表示像素点的形状，比如 1:1 表示方形像素。

pts

    /**
     * Presentation timestamp in time_base units (time when frame should be shown to user).
     */
    int64_t pts;

显示时间戳。单位是 time_base。

pkt_dts

    /**
     * DTS copied from the AVPacket that triggered returning this frame. (if frame threading isn't used)
     * This is also the Presentation time of this AVFrame calculated from
     * only AVPacket.dts values without pts values.
     */
    int64_t pkt_dts;

此 frame 对应的 packet 中的解码时间戳。是从对应 packet(解码生成此 frame)中拷贝 DTS 得到此值。
如果对应的 packet 中只有 dts 而未设置 pts，则此值也是此 frame 的 pts。

sample_rate

    /**
     * Sample rate of the audio data.
     */
    int sample_rate;

音频采样率。

buf

    /**
     * AVBuffer references backing the data for this frame. If all elements of
     * this array are NULL, then this frame is not reference counted. This array
     * must be filled contiguously -- if buf[i] is non-NULL then buf[j] must
     * also be non-NULL for all j < i.
     *
     * There may be at most one AVBuffer per data plane, so for video this array
     * always contains all the references. For planar audio with more than
     * AV_NUM_DATA_POINTERS channels, there may be more buffers than can fit in
     * this array. Then the extra AVBufferRef pointers are stored in the
     * extended_buf array.
     */
    AVBufferRef *buf[AV_NUM_DATA_POINTERS];

此帧的数据可以由 AVBufferRef 管理，AVBufferRef 提供 AVBuffer 引用机制。

这里涉及到缓冲区引用计数概念：AVBuffer 是 FFmpeg 中很常用的一种缓冲区，缓冲区使用引用计数 (reference-counted) 机制。AVBufferRef 则对 AVBuffer 缓冲区提供了一层封装，最主要的是作引用计数处理，实现了一种安全机制。用户不应直接访问 AVBuffer，应通过 AVBufferRef 来访问 AVBuffer，以保证安全。FFmpeg 中很多基础的数据结构都包含了 AVBufferRef 成员，来间接使用 AVBuffer 缓冲区。详细内容参考“FFmpeg 数据结构 AVBuffer”

如果 buf[] 的所有元素都为 NULL，则此帧不会被引用计数。必须连续填充 buf[] - 如果 buf[i] 为非 NULL，则对于所有 j<i，buf[j] 也必须为非 NULL。每个 plane 最多可以有一个 AVBuffer，一个 AVBufferRef 指针指向一个 AVBuffer，一个 AVBuffer 引用指的就是一个 AVBufferRef 指针。对于视频来说，buf[] 包含所有 AVBufferRef 指针。对于具有多于 AV_NUM_DATA_POINTERS 个声道的 planar 音频来说，buf[] 存不下所有的 AVBufferRef 指针，多出的 AVBufferRef 指针存储在 extended_buf 数组中。

extended_buf&nb_extended_buf

    /**
     * For planar audio which requires more than AV_NUM_DATA_POINTERS
     * AVBufferRef pointers, this array will hold all the references which
     * cannot fit into AVFrame.buf.
     *
     * Note that this is different from AVFrame.extended_data, which always
     * contains all the pointers. This array only contains the extra pointers,
     * which cannot fit into AVFrame.buf.
     *
     * This array is always allocated using av_malloc() by whoever constructs
     * the frame. It is freed in av_frame_unref().
     */
    AVBufferRef **extended_buf;
    /**
     * Number of elements in extended_buf.
     */
    int        nb_extended_buf;

对于具有多于 AV_NUM_DATA_POINTERS 个声道的 planar 音频来说，buf[] 存不下所有的 AVBufferRef 指针，多出的 AVBufferRef 指针存储在 extended_buf 数组中。
注意此处的 extended_buf 和 AVFrame.extended_data 的不同，AVFrame.extended_data 包含所有指向各 plane 的指针，而 extended_buf 只包含 AVFrame.buf 中装不下的指针。extended_buf 是构造 frame 时 av_frame_alloc() 中自动调用 av_malloc() 来分配空间的。调用 av_frame_unref() 会释放掉 extended_buf。nb_extended_buf 是 extended_buf 中的元素数目。

crop_

    /**
     * @anchor cropping
     * @name Cropping
     * Video frames only. The number of pixels to discard from the the
     * top/bottom/left/right border of the frame to obtain the sub-rectangle of
     * the frame intended for presentation.
     * @{
     */
    size_t crop_top;
    size_t crop_bottom;
    size_t crop_left;
    size_t crop_right;
    /**
     * @}
     */

用于视频帧图像裁切。四个值分别为从 frame 的上/下/左/右边界裁切的像素数。

2. 相关函数使用说明

2.1 av_frame_alloc()

/**
 * Allocate an AVFrame and set its fields to default values.  The resulting
 * struct must be freed using av_frame_free().
 *
 * @return An AVFrame filled with default values or NULL on failure.
 *
 * @note this only allocates the AVFrame itself, not the data buffers. Those
 * must be allocated through other means, e.g. with av_frame_get_buffer() or
 * manually.
 */
AVFrame *av_frame_alloc(void);

构造一个 frame，对象各成员被设为默认值。
此函数只分配 AVFrame 对象本身，而不分配 AVFrame 中的数据缓冲区。

2.2 av_frame_free()

/**
 * Free the frame and any dynamically allocated objects in it,
 * e.g. extended_data. If the frame is reference counted, it will be
 * unreferenced first.
 *
 * @param frame frame to be freed. The pointer will be set to NULL.
 */
void av_frame_free(AVFrame **frame);

释放一个 frame。

2.3 av_frame_ref()

/**
 * Set up a new reference to the data described by the source frame.
 *
 * Copy frame properties from src to dst and create a new reference for each
 * AVBufferRef from src.
 *
 * If src is not reference counted, new buffers are allocated and the data is
 * copied.
 *
 * @warning: dst MUST have been either unreferenced with av_frame_unref(dst),
 *           or newly allocated with av_frame_alloc() before calling this
 *           function, or undefined behavior will occur.
 *
 * @return 0 on success, a negative AVERROR on error
 */
int av_frame_ref(AVFrame *dst, const AVFrame *src);

为 src 中的数据建立一个新的引用。将 src 中帧的各属性拷到 dst 中，并且为 src 中每个 AVBufferRef 创建一个新的引用。如果 src 未使用引用计数，则 dst 中会分配新的数据缓冲区，将将 src 中缓冲区的数据拷贝到 dst 中的缓冲区。

2.4 av_frame_clone()

/**
 * Create a new frame that references the same data as src.
 *
 * This is a shortcut for av_frame_alloc()+av_frame_ref().
 *
 * @return newly created AVFrame on success, NULL on error.
 */
AVFrame *av_frame_clone(const AVFrame *src);

创建一个新的 frame，新的 frame 和 src 使用同一数据缓冲区，缓冲区管理使用引用计数机制。本函数相当于 av_frame_alloc()+av_frame_ref()

2.5 av_frame_unref()

/**
 * Unreference all the buffers referenced by frame and reset the frame fields.
 */
void av_frame_unref(AVFrame *frame);

解除本 frame 对本 frame 中所有缓冲区的引用，并复位 frame 中各成员。

2.6 av_frame_move_ref()

/**
 * Move everything contained in src to dst and reset src.
 *
 * @warning: dst is not unreferenced, but directly overwritten without reading
 *           or deallocating its contents. Call av_frame_unref(dst) manually
 *           before calling this function to ensure that no memory is leaked.
 */
void av_frame_move_ref(AVFrame *dst, AVFrame *src);

将 src 中所有数据拷贝到 dst 中，并复位 src。为避免内存泄漏，在调用 av_frame_move_ref(dst, src) 之前应先调用 av_frame_unref(dst) 。

2.7 av_frame_get_buffer()

/**
 * Allocate new buffer(s) for audio or video data.
 *
 * The following fields must be set on frame before calling this function:
 * - format (pixel format for video, sample format for audio)
 * - width and height for video
 * - nb_samples and channel_layout for audio
 *
 * This function will fill AVFrame.data and AVFrame.buf arrays and, if
 * necessary, allocate and fill AVFrame.extended_data and AVFrame.extended_buf.
 * For planar formats, one buffer will be allocated for each plane.
 *
 * @warning: if frame already has been allocated, calling this function will
 *           leak memory. In addition, undefined behavior can occur in certain
 *           cases.
 *
 * @param frame frame in which to store the new buffers.
 * @param align Required buffer size alignment. If equal to 0, alignment will be
 *              chosen automatically for the current CPU. It is highly
 *              recommended to pass 0 here unless you know what you are doing.
 *
 * @return 0 on success, a negative AVERROR on error.
 */
int av_frame_get_buffer(AVFrame *frame, int align);

为音频或视频数据分配新的缓冲区。调用本函数前，帧中的如下成员必须先设置好：

format(视频像素格式或音频采样格式)
width、height(视频画面和宽和高)
nb_samples、channel_layout(音频单个声道中的采样点数目和声道布局)

本函数会填充 AVFrame.data 和 AVFrame.buf 数组，如果有需要，还会分配和填充 AVFrame.extended_data 和 AVFrame.extended_buf。对于 planar 格式，会为每个 plane 分配一个缓冲区。

2.8 av_frame_copy()

/**
 * Copy the frame data from src to dst.
 *
 * This function does not allocate anything, dst must be already initialized and
 * allocated with the same parameters as src.
 *
 * This function only copies the frame data (i.e. the contents of the data /
 * extended data arrays), not any other properties.
 *
 * @return >= 0 on success, a negative AVERROR on error.
 */
int av_frame_copy(AVFrame *dst, const AVFrame *src);

将 src 中的帧数据拷贝到 dst 中。
本函数并不会有任何分配缓冲区的动作，调用此函数前 dst 必须已经使用了和 src 同样的参数完成了初始化。
本函数只拷贝帧中的数据缓冲区的内容(data/extended_data 数组中的内容)，而不涉及帧中任何其他的属性。

3. 参考资料

[1] FFMPEG 结构体分析：AVFrame, https://blog.csdn.net/leixiaohua1020/article/details/14214577
[2] 色彩空间与像素格式, https://www.cnblogs.com/leisure_chn/p/10290575.html

4. 修改记录

2019-01-13 V1.0 初稿
2021-01-06 V1.1 增加 1.1 节，修复 linesize 描述不清晰的问题
2021-01-16 V1.1 更新 1.1 节内容，详细解释放到“色彩空间与像素格式”文章中
2025-09-13 V1.2 更新到 FFmpeg 8.0 版本

posted @ 2019-02-20 08:49 叶余阅读(25767) 评论(3) 收藏举报

刷新页面返回顶部

叶余

一直在模仿，从来不专业