ffmpeg常用数据结构(转载)

AVCodecContext
这是一个描述编解码器上下文的数据结构,包含了众多编解码器需要的参数信息,如下列出了部分比较重要的域:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
|
typedef struct AVCodecContext { ...... /** * some codecs need / can use extradata like Huffman tables. * mjpeg: Huffman tables * rv10: additional flags * mpeg4: global headers (they can be in the bitstream or here) * The allocated memory should be FF_INPUT_BUFFER_PADDING_SIZE bytes larger * than extradata_size to avoid prolems if it is read with the bitstream reader. * The bytewise contents of extradata must not depend on the architecture or CPU endianness. * - encoding: Set/allocated/freed by libavcodec. * - decoding: Set/allocated/freed by user. */ uint8_t *extradata; int extradata_size; /** * This is the fundamental unit of time (in seconds) in terms * of which frame timestamps are represented. For fixed-fps content, * timebase should be 1/framerate and timestamp increments should be * identically 1. * - encoding: MUST be set by user. * - decoding: Set by libavcodec. */ AVRational time_base; /* video only */ /** * picture width / height. * - encoding: MUST be set by user. * - decoding: Set by libavcodec. * Note: For compatibility it is possible to set this instead of * coded_width/height before decoding. */ int width, height; ...... /* audio only */ int sample_rate; ///< samples per second int channels; ///< number of audio channels /** * audio sample format * - encoding: Set by user. * - decoding: Set by libavcodec. */ enum SampleFormat sample_fmt; ///< sample format /* The following data should not be initialized. */ /** * Samples per packet, initialized when calling 'init'. */ int frame_size; int frame_number; ///< audio or video frame number ...... char codec_name[32]; enum AVMediaType codec_type; /* see AVMEDIA_TYPE_xxx */ enum CodecID codec_id; /* see CODEC_ID_xxx */ /** * fourcc (LSB first, so "ABCD" -> ('D'<<24) + ('C'<<16) + ('B'<<8) + 'A'). * This is used to work around some encoder bugs. * A demuxer should set this to what is stored in the field used to identify the codec. * If there are multiple such fields in a container then the demuxer should choose the one * which maximizes the information about the used codec. * If the codec tag field in a container is larger then 32 bits then the demuxer should * remap the longer ID to 32 bits with a table or other structure. Alternatively a new * extra_codec_tag + size could be added but for this a clear advantage must be demonstrated * first. * - encoding: Set by user, if not then the default based on codec_id will be used. * - decoding: Set by user, will be converted to uppercase by libavcodec during init. */ unsigned int codec_tag; ...... /** * Size of the frame reordering buffer in the decoder. * For MPEG-2 it is 1 IPB or 0 low delay IP. * - encoding: Set by libavcodec. * - decoding: Set by libavcodec. */ int has_b_frames; /** * number of bytes per packet if constant and known or 0 * Used by some WAV based audio codecs. */ int block_align; ...... /** * bits per sample/pixel from the demuxer (needed for huffyuv). * - encoding: Set by libavcodec. * - decoding: Set by user. */ int bits_per_coded_sample; ...... } AVCodecContext; |
如果是单纯使用libavcodec,这部分信息需要调用者进行初始化;如果是使用整个FFMPEG库,这部分信息在调用 avformat_open_input和avformat_find_stream_info的过程中根据文件的头信息及媒体流内的头部信息完成初始 化。其中几个主要域的释义如下:
-
extradata/extradata_size:这个buffer中存放了解码器可能会用到的额外信息,在av_read_frame中填充。一般来 说,首先,某种具体格式的demuxer在读取格式头信息的时候会填充extradata,其次,如果demuxer没有做这个事情,比如可能在头部压根 儿就没有相关的编解码信息,则相应的parser会继续从已经解复用出来的媒体流中继续寻找。在没有找到任何额外信息的情况下,这个buffer指针为 空。
-
time_base:
-
width/height:视频的宽和高。
-
sample_rate/channels:音频的采样率和信道数目。
-
sample_fmt: 音频的原始采样格式。
-
codec_name/codec_type/codec_id/codec_tag:编解码器的信息。
AVStream
该结构体描述一个媒体流,定义如下:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249typedefstructAVStream {intindex;/**< stream index in AVFormatContext */intid;/**< format-specific stream ID */AVCodecContext *codec;/**< codec context *//*** Real base framerate of the stream.* This is the lowest framerate with which all timestamps can be* represented accurately (it is the least common multiple of all* framerates in the stream). Note, this value is just a guess!* For example, if the time base is 1/90000 and all frames have either* approximately 3600 or 1800 timer ticks, then r_frame_rate will be 50/1.*/AVRational r_frame_rate;....../*** This is the fundamental unit of time (in seconds) in terms* of which frame timestamps are represented. For fixed-fps content,* time base should be 1/framerate and timestamp increments should be 1.*/AVRational time_base;....../*** Decoding: pts of the first frame of the stream, in stream time base.* Only set this if you are absolutely 100% sure that the value you set* it to really is the pts of the first frame.* This may be undefined (AV_NOPTS_VALUE).* @note The ASF header does NOT contain a correct start_time the ASF* demuxer must NOT set this.*/int64_t start_time;/*** Decoding: duration of the stream, in stream time base.* If a source file does not specify a duration, but does specify* a bitrate, this value will be estimated from bitrate and file size.*/int64_t duration;#if LIBAVFORMAT_VERSION_INT < (53<<16)charlanguage[4];/** ISO 639-2/B 3-letter language code (empty string if undefined) */#endif/* av_read_frame() support */enumAVStreamParseType need_parsing;structAVCodecParserContext *parser;....../* av_seek_frame() support */AVIndexEntry *index_entries;/**< Only used if the format does notsupport seeking natively. */intnb_index_entries;unsignedintindex_entries_allocated_size;int64_t nb_frames;///< number of frames in this stream if known or 0....../*** Average framerate*/AVRational avg_frame_rate;......} AVStream;主要域的释义如下,其中大部分域的值可以由avformat_open_input根据文件头的信息确定,缺少的信息需要通过调用avformat_find_stream_info读帧及软解码进一步获取:index/id:index对应流的索引,这个数字是自动生成的,根据index可以从AVFormatContext::streams表中索引到该流;而id则是流的标识,依赖于具体的容器格式。比如对于MPEG TS格式,id就是pid。time_base:流的时间基准,是一个实数,该流中媒体数据的pts和dts都将以这个时间基准为粒度。通常,使用av_rescale/av_rescale_q可以实现不同时间基准的转换。start_time:流的起始时间,以流的时间基准为单位,通常是该流中第一个帧的pts。duration:流的总时间,以流的时间基准为单位。need_parsing:对该流parsing过程的控制域。nb_frames:流内的帧数目。r_frame_rate/framerate/avg_frame_rate:帧率相关。codec:指向该流对应的AVCodecContext结构,调用avformat_open_input时生成。parser:指向该流对应的AVCodecParserContext结构,调用avformat_find_stream_info时生成。。AVFormatContext这个结构体描述了一个媒体文件或媒体流的构成和基本信息,定义如下:typedefstructAVFormatContext {constAVClass *av_class;/**< Set by avformat_alloc_context. *//* Can only be iformat or oformat, not both at the same time. */structAVInputFormat *iformat;structAVOutputFormat *oformat;void*priv_data;ByteIOContext *pb;unsignedintnb_streams;AVStream *streams[MAX_STREAMS];charfilename[1024];/**< input or output filename *//* stream info */int64_t timestamp;#if LIBAVFORMAT_VERSION_INT < (53<<16)chartitle[512];charauthor[512];charcopyright[512];charcomment[512];charalbum[512];intyear;/**< ID3 year, 0 if none */inttrack;/**< track number, 0 if none */chargenre[32];/**< ID3 genre */#endifintctx_flags;/**< Format-specific flags, see AVFMTCTX_xx *//* private data for pts handling (do not modify directly). *//** This buffer is only needed when packets were already buffered butnot decoded, for example to get the codec parameters in MPEGstreams. */structAVPacketList *packet_buffer;/** Decoding: position of the first frame of the component, inAV_TIME_BASE fractional seconds. NEVER set this value directly:It is deduced from the AVStream values. */int64_t start_time;/** Decoding: duration of the stream, in AV_TIME_BASE fractionalseconds. Only set this value if you know none of the individual streamdurations and also dont set any of them. This is deduced from theAVStream values if not set. */int64_t duration;/** decoding: total file size, 0 if unknown */int64_t file_size;/** Decoding: total stream bitrate in bit/s, 0 if notavailable. Never set it directly if the file_size and theduration are known as FFmpeg can compute it automatically. */intbit_rate;/* av_read_frame() support */AVStream *cur_st;#if LIBAVFORMAT_VERSION_INT < (53<<16)constuint8_t *cur_ptr_deprecated;intcur_len_deprecated;AVPacket cur_pkt_deprecated;#endif/* av_seek_frame() support */int64_t data_offset;/** offset of the first packet */intindex_built;intmux_rate;unsignedintpacket_size;intpreload;intmax_delay;#define AVFMT_NOOUTPUTLOOP -1#define AVFMT_INFINITEOUTPUTLOOP 0/** number of times to loop output in formats that support it */intloop_output;intflags;#define AVFMT_FLAG_GENPTS 0x0001 ///< Generate missing pts even if it requires parsing future frames.#define AVFMT_FLAG_IGNIDX 0x0002 ///< Ignore index.#define AVFMT_FLAG_NONBLOCK 0x0004 ///< Do not block when reading packets from input.#define AVFMT_FLAG_IGNDTS 0x0008 ///< Ignore DTS on frames that contain both DTS & PTS#define AVFMT_FLAG_NOFILLIN 0x0010 ///< Do not infer any values from other values, just return what is stored in the container#define AVFMT_FLAG_NOPARSE 0x0020 ///< Do not use AVParsers, you also must set AVFMT_FLAG_NOFILLIN as the fillin code works on frames and no parsing -> no frames. Also seeking to frames can not work if parsing to find frame boundaries has been disabled#define AVFMT_FLAG_RTP_HINT 0x0040 ///< Add RTP hinting to the output fileintloop_input;/** decoding: size of data to probe; encoding: unused. */unsignedintprobesize;/*** Maximum time (in AV_TIME_BASE units) during which the input should* be analyzed in avformat_find_stream_info().*/intmax_analyze_duration;constuint8_t *key;intkeylen;unsignedintnb_programs;AVProgram **programs;/*** Forced video codec_id.* Demuxing: Set by user.*/enumCodecID video_codec_id;/*** Forced audio codec_id.* Demuxing: Set by user.*/enumCodecID audio_codec_id;/*** Forced subtitle codec_id.* Demuxing: Set by user.*/enumCodecID subtitle_codec_id;/*** Maximum amount of memory in bytes to use for the index of each stream.* If the index exceeds this size, entries will be discarded as* needed to maintain a smaller size. This can lead to slower or less* accurate seeking (depends on demuxer).* Demuxers for which a full in-memory index is mandatory will ignore* this.* muxing : unused* demuxing: set by user*/unsignedintmax_index_size;/*** Maximum amount of memory in bytes to use for buffering frames* obtained from realtime capture devices.*/unsignedintmax_picture_buffer;unsignedintnb_chapters;AVChapter **chapters;/*** Flags to enable debugging.*/intdebug;#define FF_FDEBUG_TS 0x0001/*** Raw packets from the demuxer, prior to parsing and decoding.* This buffer is used for buffering packets until the codec can* be identified, as parsing cannot be done without knowing the* codec.*/structAVPacketList *raw_packet_buffer;structAVPacketList *raw_packet_buffer_end;structAVPacketList *packet_buffer_end;AVMetadata *metadata;/*** Remaining size available for raw_packet_buffer, in bytes.* NOT PART OF PUBLIC API*/#define RAW_PACKET_BUFFER_SIZE 2500000intraw_packet_buffer_remaining_size;/*** Start time of the stream in real world time, in microseconds* since the unix epoch (00:00 1st January 1970). That is, pts=0* in the stream was captured at this real world time.* - encoding: Set by user.* - decoding: Unused.*/int64_t start_time_realtime;} AVFormatContext;这是FFMpeg中最为基本的一个结构,是其他所有结构的根,是一个多媒体文件或流的根本抽象。其中:
-
nb_streams和streams所表示的AVStream结构指针数组包含了所有内嵌媒体流的描述;
-
iformat和oformat指向对应的demuxer和muxer指针;
-
pb则指向一个控制底层数据读写的ByteIOContext结构。
-
start_time和duration是从streams数组的各个AVStream中推断出的多媒体文件的起始时间和长度,以微妙为单位。
通常,这个结构由avformat_open_input在内部创建并以缺省值初始化部分成员。但是,如果调用者希望自己创建该结构,则需要显式为该结构的一些成员置缺省值——如果没有缺省值的话,会导致之后的动作产生异常。以下成员需要被关注:
-
probesize
-
mux_rate
-
packet_size
-
flags
-
max_analyze_duration
-
key
-
max_index_size
-
max_picture_buffer
-
max_delay
AVPacket
AVPacket定义在avcodec.h中,如下:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647typedefstructAVPacket {/*** Presentation timestamp in AVStream->time_base units; the time at which* the decompressed packet will be presented to the user.* Can be AV_NOPTS_VALUE if it is not stored in the file.* pts MUST be larger or equal to dts as presentation cannot happen before* decompression, unless one wants to view hex dumps. Some formats misuse* the terms dts and pts/cts to mean something different. Such timestamps* must be converted to true pts/dts before they are stored in AVPacket.*/int64_t pts;/*** Decompression timestamp in AVStream->time_base units; the time at which* the packet is decompressed.* Can be AV_NOPTS_VALUE if it is not stored in the file.*/int64_t dts;uint8_t *data;intsize;intstream_index;intflags;/*** Duration of this packet in AVStream->time_base units, 0 if unknown.* Equals next_pts - this_pts in presentation order.*/intduration;void(*destruct)(structAVPacket *);void*priv;int64_t pos;///< byte position in stream, -1 if unknown/*** Time difference in AVStream->time_base units from the pts of this* packet to the point at which the output from the decoder has converged* independent from the availability of previous frames. That is, the* frames are virtually identical no matter if decoding started from* the very first frame or from this keyframe.* Is AV_NOPTS_VALUE if unknown.* This field is not the display duration of the current packet.** The purpose of this field is to allow seeking in streams that have no* keyframes in the conventional sense. It corresponds to the* recovery point SEI in H.264 and match_time_delta in NUT. It is also* essential for some types of subtitle streams to ensure that all* subtitles are correctly displayed after seeking.*/int64_t convergence_duration;} AVPacket;FFMPEG使用AVPacket来暂存解复用之后、解码之前的媒体数据(一个音/视频帧、一个字幕包等)及附加信息(解码时间戳、显示时间戳、时长等)。其中:
-
dts表示解码时间戳,pts表示显示时间戳,它们的单位是所属媒体流的时间基准。
-
stream_index给出所属媒体流的索引;
-
data为数据缓冲区指针,size为长度;
-
duration为数据的时长,也是以所属媒体流的时间基准为单位;
-
pos表示该数据在媒体流中的字节偏移量;
-
destruct为用于释放数据缓冲区的函数指针;
-
flags为标志域,其中,最低为置1表示该数据是一个关键帧。
AVPacket结构本身只是个容器,它使用data成员引用实际的数据缓冲区。这个缓冲区通常是由av_new_packet创建的,但也可能由 FFMPEG的API创建(如av_read_frame)。当某个AVPacket结构的数据缓冲区不再被使用时,要需要通过调用 av_free_packet释放。av_free_packet调用的是结构体本身的destruct函数,它的值有两种情 况:1)av_destruct_packet_nofree或0;2)av_destruct_packet,其中,情况1)仅仅是将data和 size的值清0而已,情况2)才会真正地释放缓冲区。
FFMPEG内部使用AVPacket结构建立缓冲区装载数据,同时提供destruct函数,如果FFMPEG打算自己维护缓冲区,则将 destruct设为av_destruct_packet_nofree,用户调用av_free_packet清理缓冲区时并不能够将其释放;如果 FFMPEG打算将该缓冲区彻底交给调用者,则将destruct设为av_destruct_packet,表示它能够被释放。安全起见,如果用户希望 自由地使用一个FFMPEG内部创建的AVPacket结构,最好调用av_dup_packet进行缓冲区的克隆,将其转化为缓冲区能够被释放的 AVPacket,以免对缓冲区的不当占用造成异常错误。av_dup_packet会为destruct指针为 av_destruct_packet_nofree的AVPacket新建一个缓冲区,然后将原缓冲区的数据拷贝至新缓冲区,置data的值为新缓冲区 的地址,同时设destruct指针为av_destruct_packet。
-

浙公网安备 33010602011771号