python pycparser库学习
ReadMe
简介
GitHub:[GitHub - eliben/pycparser: :snake: Complete C99 parser in pure Python](https://github.com/eliben/pycparser)
pycparser 是一个用纯 Python 编写的 C 语言解析器。它是一个模块,旨在轻松集成到需要解析 C 源代码的应用程序中。
pycparser 适用于任何需要解析 C 代码的场景。以下是一些应用案例:
• C 代码混淆器
• 各类专用 C 编译器的前端
• 静态代码检查器
• 单元测试自动发现
• 为 C 语言添加专用扩展
pycparser 最受欢迎的用途之一是 [CFFI 文档 — CFFI 1.18.0.dev0 文档 --- CFFI documentation — CFFI 1.18.0.dev0 documentation](https://cffi.readthedocs.io/en/latest/index.html) 库,该库用它解析 C 函数和类型的声明以自动生成 FFI(外部函数接口)。
pycparser 的独特之处在于它是用纯 Python 编写的。熟悉 Lex 和 Yacc 的人会很容易理解 pycparser 的代码。它也没有外部依赖(只需 Python 解释器),安装和部署非常简单。
安装
前置条件
• pycparser 已在 Linux、macOS 和 Windows 上通过 Python 3.8+ 测试。
• pycparser 无外部依赖,唯一使用的非标准库是 PLY(已捆绑在 pycparser/ply 中)。
注意:pycparser(和 PLY)使用文档字符串(docstrings)进行语法规范。若 Python 安装移除了文档字符串(如使用 -OO 选项),将无法实例化或使用 pycparser。可尝试在正常模式下预生成 PLY 解析表以规避此问题,但此非官方支持的操作模式。
安装步骤
推荐使用 pip 安装 pycparser:
pip install pycparser
使用指南
与 C 预处理器的交互
C 代码必须通过 C 预处理器(cpp)预处理后才能编译。cpp 处理 #include 和 #define 等预处理指令,删除注释,并执行其他准备工作。
除极简单的 C 代码片段外,pycparser 需接收预处理后的 C 代码才能正确工作。若从 pycparser 包导入顶层 parse_file 函数,只要 cpp 在 PATH 中或提供其路径,该函数会自动调用 cpp。
注意:可用 gcc -E 或 clang -E 替代 cpp。详见 using_gcc_E_libc.py 示例。
关于标准 C 库头文件
C 代码常通过 #include 包含标准库头文件(如 stdio.h)。虽然 pycparser 可解析任意 C 编译器的标准头文件(需额外工作),但更推荐使用 utils/fake_libc_include 中提供的 C11 "伪" 标准头文件。这些头文件仅包含必要内容,能有效解析依赖它们的文件,且由于极简,可显著提升解析大文件的性能。
关键点在于:pycparser 不关心类型的语义,只需知道源码中的标识符是否为已定义的类型。这对正确解析 C 代码至关重要。
详见此博客:<https://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers>
注意:伪头文件未包含在 pip 包中,也不通过 setup.py 安装(见 #224 <https://github.com/eliben/pycparser/issues/224>_)。
基础用法
查看发行版的 examples 目录以获取使用示例。多数实际 C 代码需先经预处理器处理再传递给 pycparser,详见前文。
高级用法
pycparser 的公共接口在 pycparser/c_parser.py 中有详细注释。关于解析器生成的 AST 节点,详见 pycparser/_c_ast.cfg。
修改 pycparser
修改时需注意以下事项:
• pycparser 的 AST 节点代码由配置文件 _c_ast.cfg 通过 _ast_gen.py 自动生成。若修改 AST 配置,需重新生成代码(运行 pycparser 目录下的 _build_tables.py 脚本)。
• 需理解 pycparser 的优化模式,详见 CParser 类的文档字符串。开发时应禁用优化以确保修改语法后重新生成 Yacc/Lex 表。
下载包内容
解压 pycparser 包后可见以下文件和目录:
- README.rst: 本文档
- LICENSE: 许可证文件
- setup.py: 安装脚本
- examples/: 使用示例目录
- pycparser/: 模块源码
- tests/: 单元测试
- utils/fake_libc_include: 最小化标准 C 库头文件,可解析任意 C 代码。注意这些头文件包含 C11 内容,若预处理器配置为早期标准(如
-std=c99)可能不兼容。 - utils/internal/: 内部工具,一般无需使用
官方examples
路径:examples
执行方式:在pycparser根目录执行
前提:大多数实际的C代码示例需要在将代码传递给pycparser之前运行C预处理器进行处理
测试环境:
$ gcc -v
gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)
$ python --version
Python 2.7.5
using_gcc_E_libc.py
执行命令:
python examples/using_gcc_E_libc.py
解析的C文件:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void convert(int thousands, int hundreds, int tens, int ones)
{
char *num[] = {"", "One", "Two", "Three", "Four", "Five", "Six",
"Seven", "Eight", "Nine"};
char *for_ten[] = {"", "", "Twenty", "Thirty", "Forty", "Fifty", "Sixty",
"Seventy", "Eighty", "Ninety"};
char *af_ten[] = {"Ten", "Eleven", "Twelve", "Thirteen", "Fourteen",
"Fifteen", "Sixteen", "Seventeen", "Eighteen", "Ninteen"};
printf("\nThe year in words is:\n");
printf("%s thousand", num[thousands]);
if (hundreds != 0)
printf(" %s hundred", num[hundreds]);
if (tens != 1)
printf(" %s %s", for_ten[tens], num[ones]);
else
printf(" %s", af_ten[ones]);
}
int main()
{
int year;
int n1000, n100, n10, n1;
printf("\nEnter the year (4 digits): ");
scanf("%d", &year);
if (year > 9999 || year < 1000)
{
printf("\nError !! The year must contain 4 digits.");
exit(EXIT_FAILURE);
}
n1000 = year/1000;
n100 = ((year)%1000)/100;
n10 = (year%100)/10;
n1 = ((year%10)%10);
convert(n1000, n100, n10, n1);
return 0;
}
解析后的语法树ast,以及一些ast节点的说明:
- convert函数
FuncDef: # 函数定义: 函数声明或函数实现
Decl: convert, [], [], [], [] # 函数名
FuncDecl: # 函数声明的具体信息
ParamList: # 函数参数列表
Decl: thousands, [], [], [], [] # 每个 Decl 表示一个参数, 这里是thousands
TypeDecl: thousands, [], None # 参数的类型声明, None:类型限定符(如const、volatile 等,这里为空)
IdentifierType: ['int'] # 参数的具体类型
Decl: hundreds, [], [], [], []
TypeDecl: hundreds, [], None
IdentifierType: ['int']
Decl: tens, [], [], [], []
TypeDecl: tens, [], None
IdentifierType: ['int']
Decl: ones, [], [], [], []
TypeDecl: ones, [], None
IdentifierType: ['int']
TypeDecl: convert, [], None # 函数返回值列表
IdentifierType: ['void'] # 返回的具体类型
Compound: # 复合语句块,通常是函数体或代码块
Decl: num, [], [], [], [] # 变量声明, 变量名为num
ArrayDecl: [] # 数组声明
PtrDecl: [] # 指针声明
TypeDecl: num, [], None # 类型声明
IdentifierType: ['char'] # 变量的具体类型
InitList: # 初始化列表, 用于初始化数组
Constant: string, "" # 每个 Constant 表示一个初始化值
Constant: string, "One"
Constant: string, "Two"
Constant: string, "Three"
Constant: string, "Four"
Constant: string, "Five"
Constant: string, "Six"
Constant: string, "Seven"
Constant: string, "Eight"
Constant: string, "Nine"
Decl: for_ten, [], [], [], []
ArrayDecl: []
PtrDecl: []
TypeDecl: for_ten, [], None
IdentifierType: ['char']
InitList:
Constant: string, ""
Constant: string, ""
Constant: string, "Twenty"
Constant: string, "Thirty"
Constant: string, "Forty"
Constant: string, "Fifty"
Constant: string, "Sixty"
Constant: string, "Seventy"
Constant: string, "Eighty"
Constant: string, "Ninety"
Decl: af_ten, [], [], [], []
ArrayDecl: []
PtrDecl: []
TypeDecl: af_ten, [], None
IdentifierType: ['char']
InitList:
Constant: string, "Ten"
Constant: string, "Eleven"
Constant: string, "Twelve"
Constant: string, "Thirteen"
Constant: string, "Fourteen"
Constant: string, "Fifteen"
Constant: string, "Sixteen"
Constant: string, "Seventeen"
Constant: string, "Eighteen"
Constant: string, "Ninteen"
FuncCall: # 函数调用
ID: printf # 函数名称
ExprList: # 函数参数列表
Constant: string, "\nThe year in words is:\n" # string:常量类型, 后面跟的常量具体值
FuncCall:
ID: printf
ExprList:
Constant: string, "%s thousand"
ArrayRef: # 表示一个数组引用
ID: num # 数组名称
ID: thousands # 数组索引
If: # if条件判断
BinaryOp: != # 条件表达式, 表示一个二元操作符 !=
ID: hundreds # 左操作数
Constant: int, 0 # 右操作数, 是一个常量0
FuncCall: # 条件为真时执行的语句
ID: printf
ExprList:
Constant: string, " %s hundred"
ArrayRef:
ID: num
ID: hundreds
If:
BinaryOp: !=
ID: tens
Constant: int, 1
FuncCall: # 条件为真时执行的语句
ID: printf
ExprList:
Constant: string, " %s %s"
ArrayRef:
ID: for_ten
ID: tens
ArrayRef:
ID: num
ID: ones
FuncCall: # 条件为假时执行的语句
ID: printf
ExprList:
Constant: string, " %s"
ArrayRef:
ID: af_ten
ID: ones
serialize_ast.py
功能:序列化ast,将解析后的ast使用pickle模块进行打包,然后重新加载,解析解析。
rewrite_ast.py
功能:更改ast某个节点的值
func_defs.py
功能:使用pycparser打印出C文件中定义的所有函数
执行输出:
$ python examples/func_defs.py
memmgr_init at examples/c_files/memmgr.c:46:6
get_mem_from_pool at examples/c_files/memmgr.c:55:22
memmgr_alloc at examples/c_files/memmgr.c:90:7
memmgr_free at examples/c_files/memmgr.c:159:6
func_defs_add_param.py
功能:给每一个函数添加一个入参int _hidden
func_calls.py
功能:打印所有的函数调用
执行输出:
$ python examples/func_calls.py
foo called at examples/c_files/basic.c:4:3
explore_ast.py
功能:指导如何解析pycparser返回的ast,比如如何获取函数声明、函数体等
construct_ast_from_scratch.py
功能:将ast转化为C代码
c-to-c.py
将ast文件转换为C文件
扩展
示例1:解析头文件
功能:解析一个头文件,输出文件中的各种类型的值,包括:数据结构、枚举、共用体、函数声明、函数指针。
前提:由于pycparser解析的是预编译后的文件, 但示例中的头文件没有经过预编译,因此头文件中不能有宏,不能有注释。
比如头文件内容如下:
typedef char uint8_t;
typedef unsigned int uint32_t;
typedef uint8_t uint8_arr8[8];
typedef uint8_t uint8_arr128[128];
typedef uint8_t* uint8_p;
typedef uint8_t** uint8_pp;
typedef unsigned char byte;
typedef unsigned long ulong;
typedef union union_a0_u
{
uint32_t a;
uint32_t b;
} union_a0;
typedef union union_a1_u
{
uint8_p c;
uint8_arr8 d;
} union_a1;
typedef enum enum_b0_e {
NONE = 0,
NUM0 = 100,
} enum_b0;
typedef enum enum_b1_e {
NUM2 = -10,
NUM3 = 12,
} enum_b1;
typedef struct struct_c_s {
char type;
union u_member_u{
uint32_t u0;
uint32_t u1;
} u_member;
struct s_member0_s{
int sa;
int sb;
struct {
int se;
int sf;
}sub1;
} s_member0;
struct {
int *c;
int **d;
} s_member1;
} struct_c;
int func_a(void);
void* func_b(char a, char *b, int a, int *b);
int func_c(struct_c c, enum_b0 b, union_a0 a);
typedef uint32_t (*func_p_0)(uint32_t a, uint8_arr128 b);
typedef void (*func_p_1)(const char *file, uint32_t line, const char *fmt, ...);
输出内容如下,EllipsisParam表示是一个可变参数:
typedef information:
------------------------------------
uint8_t :: char
uint32_t :: unsigned int
uint8_arr8 :: uint8_t[8]
uint8_arr128 :: uint8_t[128]
uint8_p :: uint8_t*
uint8_pp :: uint8_t**
byte :: unsigned char
ulong :: unsigned long
------------------------------------
enum information:
------------------------------
name: enum_b0_e:enum_b0
['NONE', '0']
['NUM0', '100']
++++++++++++++++++++++++++++++
name: enum_b1_e:enum_b1
['NUM2', '-10']
['NUM3', '12']
++++++++++++++++++++++++++++++
------------------------------
union information:
------------------------------
name: union_a0:union_a0_u
['a', 'uint32_t']
['b', 'uint32_t']
++++++++++++++++++++++++++++++
name: union_a1:union_a1_u
['c', 'uint8_p']
['d', 'uint8_arr8']
++++++++++++++++++++++++++++++
------------------------------
function pointer information:
------------------------------
name: func_p_0:uint32_t
['uint32_t', 'a']
['uint8_arr128', 'b']
++++++++++++++++++++++++++++++
name: func_p_1:void
['const char*', 'file']
['uint32_t', 'line']
['const char*', 'fmt']
['EllipsisParam', 'EllipsisParam']
++++++++++++++++++++++++++++++
------------------------------
function declare information:
------------------------------
name: func_a:int
['void', None]
++++++++++++++++++++++++++++++
name: func_b:void*
['char', 'a']
['char*', 'b']
['int', 'a']
['int*', 'b']
++++++++++++++++++++++++++++++
name: func_c:int
['struct_c', 'c']
['enum_b0', 'b']
['union_a0', 'a']
++++++++++++++++++++++++++++++
------------------------------
structure information:
------------------------------
name: struct_c
['char', 'type']
union u_member_u, u_member
['uint32_t', 'u0']
['uint32_t', 'u1']
struct s_member0_s, s_member0
['int', 'sa']
['int', 'sb']
anonymous_struct, sub1
['int', 'se']
['int', 'sf']
anonymous_struct, s_member1
['int*', 'c']
['int**', 'd']
++++++++++++++++++++++++++++++
------------------------------
实现源码:
import re
import logging
from pycparser import c_ast, parse_file
logging.basicConfig(level=logging.NOTSET, format='[%(filename)s:%(lineno)d]-%(levelname)s %(message)s')
class c_visitor(object):
def __init__(self, file_path):
self.typedef_dict = {}
self.struct_dict = {}
self.union_dict = {}
self.func_p_dict = {}
self.enum_dict = {}
self.func_declare_dict = {}
self.ast = parse_file(file_path)
with open('pycparser.txt', 'w+') as f: # 将ast存放下来, 方便查看
f.write(str(self.ast))
self.visit_struct()
self.visit_typedef()
self.visit_enum()
self.visit_union()
self.visit_func_pointer()
self.visit_func_declare()
def _get_type_name(self, node):
"""递归获取类型名称,处理数组和指针类型"""
if isinstance(node, c_ast.TypeDecl):
quals_str = ''
if len(node.quals) != 0:
for qual_str in node.quals:
quals_str = quals_str + ' ' + qual_str
quals_str = quals_str.strip() + ' '
return quals_str + self._get_type_name(node.type)
elif isinstance(node, c_ast.IdentifierType):
return ' '.join(node.names)
elif isinstance(node, c_ast.ArrayDecl):
if node.dim:
if isinstance(node.dim, c_ast.ID):
dim = node.dim.name
elif isinstance(node.dim, c_ast.FuncCall):
tmp_type_str = ''
for item in node.type.type.names:
tmp_type_str = tmp_type_str + ' ' + item
tmp_type_str = tmp_type_str.strip()
return f"{tmp_type_str}[{node.dim.name.name}({node.dim.args.exprs[0].name})]"
else:
dim = node.dim.value
else:
dim = ''
return f"{self._get_type_name(node.type)}[{dim}]"
elif isinstance(node, c_ast.PtrDecl):
return f"{self._get_type_name(node.type)}*"
elif isinstance(node, c_ast.Constant):
return node.value
elif isinstance(node, c_ast.Struct):
return 'struct ' + node.name
elif isinstance(node, c_ast.UnaryOp): # 有符号数
return node.op + f"{self._get_type_name(node.expr)}"
return ''
def resolve_type(self, node):
"""深度解析类型节点"""
while isinstance(node, (c_ast.TypeDecl, c_ast.PtrDecl, c_ast.ArrayDecl)):
node = node.type
return node
def get_type_definition(self, decl):
"""获取类型的基础定义"""
base_type = decl.type
while isinstance(base_type, (c_ast.TypeDecl, c_ast.PtrDecl)):
base_type = base_type.type
return base_type
def process_member(self, decl):
"""处理单个结构体成员"""
# 解析基础类型
base_type = self.get_type_definition(decl)
# 处理匿名结构体/联合体
if isinstance(base_type, (c_ast.Struct, c_ast.Union)):
members = []
if base_type.decls != None:
for child in base_type.decls:
members.append(self.process_member(child))
if isinstance(base_type, c_ast.Struct):
# logging.debug(base_type)
if base_type.name == None:
tag_type = 'anonymous_struct'
else:
tag_type = 'struct ' + str(base_type.name)
elif isinstance(base_type, c_ast.Union):
if base_type.name == None:
tag_type = 'anonymous_union'
else:
tag_type = 'union ' + str(base_type.name)
return [tag_type, decl.name, members]
else:
# 处理结构体的成员是一个结构体,比如struct s *a;
if isinstance(base_type, c_ast.Struct):
return ['struct ' + self._get_type_name(decl.type) + base_type.name, decl.name, '']
elif isinstance(base_type, c_ast.Union):
return ['union ' + base_type.name, decl.name, '']
# 处理普通类型
type_str = []
node = decl.type
while isinstance(node, (c_ast.TypeDecl, c_ast.PtrDecl, c_ast.ArrayDecl)):
if isinstance(node, c_ast.ArrayDecl):
if node.dim:
if isinstance(node.dim, c_ast.ID):
type_str.append(f"[{node.dim.name}]")
elif isinstance(node.dim, c_ast.FuncCall):
tmp_type_str = ''
for item in node.type.type.names:
tmp_type_str = tmp_type_str + ' ' + item
logging.debug(f"{tmp_type_str}[{node.dim.name.name}({node.dim.args.exprs[0].name})]")
type_str.append(f"[{node.dim.name.name}({node.dim.args.exprs[0].name})]")
else:
type_str.append(f"[{node.dim.value}]")
else:
type_str.append([])
node = node.type
elif isinstance(node, c_ast.PtrDecl):
type_str.append('*')
node = node.type
else:
node = node.type
type_name = self._get_type_name(decl.type)
type_str = type_name + ''.join(reversed(type_str))
return [type_name, decl.name]
def visit_struct(self):
for item in self.ast.ext:
if isinstance(item, c_ast.Typedef) and isinstance(self.resolve_type(item.type), c_ast.Struct):
struct_name = item.name
struct_def = self.resolve_type(item.type)
members_list = []
if struct_def.decls == None:
continue
for decl in struct_def.decls:
members_list.append(self.process_member(decl))
self.struct_dict[struct_name] = members_list
def visit_typedef(self):
for decl in self.ast.ext:
if isinstance(decl, c_ast.Typedef):
if isinstance(decl.type.type, (c_ast.Struct, c_ast.Enum, c_ast.Union, c_ast.FuncDecl)):
continue
type_name = decl.name
base_type = self._get_type_name(decl.type)
self.typedef_dict[type_name] = base_type
def visit_enum(self):
for item in self.ast.ext:
if isinstance(item, c_ast.Typedef) and isinstance(self.resolve_type(item.type), c_ast.Enum):
enum_declname = item.type.declname # 声明的名字
enum_name = item.type.type.name # 名字
members_list = []
for enum_member in item.type.type.values.enumerators:
member_name = enum_member.name
member_value = enum_member.value
if member_value != None:
member_value = self._get_type_name(member_value)
members_list.append([member_name, member_value])
self.enum_dict[f"{enum_name}:{enum_declname}"] = members_list
def visit_union(self):
for item in self.ast.ext:
if isinstance(item, c_ast.Typedef) and isinstance(self.resolve_type(item.type), c_ast.Union):
union_declname = item.type.declname
union_name = item.type.type.name
members_list = []
for union_member in item.type.type.decls:
member_name = union_member.name
member_value = self._get_type_name(union_member.type)
members_list.append([member_name, member_value])
self.union_dict[f"{union_declname}:{union_name}"] = members_list
def visit_func_pointer(self):
for item in self.ast.ext:
if isinstance(item, c_ast.Typedef) and isinstance(self.resolve_type(item.type), c_ast.FuncDecl):
funcP_declname = item.name
return_type = self._get_type_name(item.type.type.type)
members_list = []
for funcP_member in item.type.type.args:
if isinstance(funcP_member, c_ast.EllipsisParam): # 表示是可变参数
param_type = 'EllipsisParam'
param_name = 'EllipsisParam'
else:
param_type = self._get_type_name(funcP_member.type)
param_name = funcP_member.name
members_list.append([param_type, param_name])
self.func_p_dict[f"{funcP_declname}:{return_type}"] = members_list
def visit_func_declare(self):
for item in self.ast.ext:
if isinstance(item, c_ast.Decl) and isinstance(self.resolve_type(item.type), c_ast.FuncDecl):
func_declname = item.name
return_type = self._get_type_name(item.type.type)
# logging.debug([func_declname, return_type])
members_list = []
for param_member in item.type.args.params:
if isinstance(param_member, c_ast.EllipsisParam): # 表示是可变参数
param_type = 'EllipsisParam'
param_name = 'EllipsisParam'
else:
param_type = self._get_type_name(param_member.type)
param_name = param_member.name
members_list.append([param_type, param_name])
self.func_declare_dict[f"{func_declname}:{return_type}"] = members_list
def print_visit(self):
typedef_dict = self.typedef_dict
struct_dict = self.struct_dict
enum_dict = self.enum_dict
union_dict = self.union_dict
func_p_dict = self.func_p_dict
func_declare_dict = self.func_declare_dict
print ('typedef information:')
print ('------------------------------------')
for typedef_key in typedef_dict:
print (" %15s :: %s" % (typedef_key, typedef_dict[typedef_key]))
print ('------------------------------------\n')
print ('enum information:')
print ('------------------------------')
for enum_key in enum_dict:
print (f'name: {enum_key}')
for item in enum_dict[enum_key]:
print (f" {item}")
print ('++++++++++++++++++++++++++++++')
print ('------------------------------\n')
print ('union information:')
print ('------------------------------')
for union_key in union_dict:
print (f'name: {union_key}')
for item in union_dict[union_key]:
print (f" {item}")
print ('++++++++++++++++++++++++++++++')
print ('------------------------------\n')
print ('function pointer information:')
print ('------------------------------')
for funcP_key in func_p_dict:
print (f'name: {funcP_key}')
for item in func_p_dict[funcP_key]:
print (f" {item}")
print ('++++++++++++++++++++++++++++++')
print ('------------------------------\n')
print ('function declare information:')
print ('------------------------------')
for funcD_key in func_declare_dict:
print (f'name: {funcD_key}')
for item in func_declare_dict[funcD_key]:
print (f" {item}")
print ('++++++++++++++++++++++++++++++')
print ('------------------------------\n')
print ('structure information:')
print ('------------------------------')
for struct_key in struct_dict:
print (f'name: {struct_key}')
for struct_data in struct_dict[struct_key]:
if len (struct_data) > 2:
struct_union_type = struct_data[0]
struct_union_name = struct_data[1]
print (f' {struct_union_type}, {struct_union_name}')
for item in struct_data[2:][0]:
if len (item) > 2:
struct_union_type = item[0]
struct_union_name = item[1]
print (f' {struct_union_type}, {struct_union_name}')
for item0 in item[2:][0]:
print (f' {item0}')
else:
print (f' {item}')
else:
print (f' {struct_data}')
print ('++++++++++++++++++++++++++++++')
print ('------------------------------\n')
if __name__ == '__main__':
file_path = "test_code.h"
header_visitor = c_visitor(file_path)
header_visitor.print_visit()
示例2:自动解析ast
原理
在 pycparser 中,NodeVisitor 类的工作机制是 自动遍历 AST 节点并根据节点类型触发对应的 visit_* 方法。你无需手动调用 visit_Typedef,它会通过以下流程自动执行:
- AST 的结构:
当解析 C 代码时,生成的 AST 中会包含多个节点,例如:
•Typedef(对应typedef语句)
•Struct(对应struct定义)
•Union(对应union定义)
•Decl(对应变量/成员声明) NodeVisitor的递归遍历:
当你调用visitor.visit(ast)时,它会从 AST 的根节点开始递归遍历所有子节点。对于每个节点:
• 检查该节点的类型(如Typedef)。
• 如果访问者类中定义了visit_Typedef方法,则自动调用它。
• 如果没有定义visit_Typedef,则调用通用的generic_visit方法(默认继续遍历子节点)。
若仍不理解,可以添加调试代码,打印所有节点类型:
from pycparser import c_ast, c_parser
class DebugVisitor(c_ast.NodeVisitor):
def visit(self, node):
print(f"Visiting node type: {node.__class__.__name__}")
super().visit(node)
if __name__ == '__main__':
text = '''
typedef unsigned char uint8_t;
'''
parser = c_parser.CParser()
ast = parser.parse(text)
# 使用调试访问者
debug_visitor = DebugVisitor()
debug_visitor.visit(ast)
print(ast)
输出会显示遍历的所有节点类型,例如:
Visiting node type: FileAST
Visiting node type: Typedef
Visiting node type: TypeDecl
Visiting node type: IdentifierType
这时候的ast如下:
FileAST(ext=[Typedef(name='uint8_t',
quals=[
],
storage=['typedef'
],
type=TypeDecl(declname='uint8_t',
quals=[
],
align=None,
type=IdentifierType(names=['unsigned',
'char'
]
)
)
)
]
)
实现
根据上述的原理,编写一个测试代码,解析全部的typedef定义
前提:由于pycparser解析的是预编译后的文件, 但示例中的头文件没有经过预编译,因此头文件中不能有宏,不能有注释。
需要解析的头文件:
typedef char uint8_t;
typedef unsigned int uint32_t;
typedef uint8_t uint8_arr8[8];
typedef uint8_t uint8_arr128[128];
typedef uint8_t* uint8_p;
typedef uint8_t** uint8_pp;
typedef unsigned char byte;
typedef unsigned long ulong;
源码实现:
import logging
from pycparser import c_ast, parse_file
logging.basicConfig(level=logging.NOTSET, format='[%(filename)s:%(lineno)d]-%(levelname)s %(message)s')
class c_visitor(c_ast.NodeVisitor):
def __init__(self):
self.typedef_dict = {}
self.typedef_struct_map = {} # 记录typedef与结构体的映射
def _get_type_name(self, type_node):
if isinstance(type_node, c_ast.IdentifierType):
return ' '.join(type_node.names)
elif isinstance(type_node, c_ast.TypeDecl):
return self._get_type_name(type_node.type)
elif isinstance(type_node, c_ast.ArrayDecl):
dim = self._get_dim_value(type_node.dim)
return f'{self._get_type_name(type_node.type)}[{dim}]'
elif isinstance(type_node, c_ast.PtrDecl):
if isinstance(type_node.type, c_ast.FuncDecl):
return '函数指针'
return f'{self._get_type_name(type_node.type)}*'
elif isinstance(type_node, c_ast.Struct):
return f'struct {type_node.name}' if type_node.name else 'anonymous_struct'
elif isinstance(type_node, c_ast.Union):
return f'union {type_node.name}' if type_node.name else 'anonymous_union'
elif isinstance(type_node, c_ast.Enum):
return f'enum {type_node.name}' if type_node.name else 'anonymous_enum'
else:
return type_node.__class__.__name__
def _get_dim_value(self, dim_node):
if isinstance(dim_node, c_ast.Constant):
return dim_node.value
elif isinstance(dim_node, c_ast.UnaryOp):
return self._eval_unaryop(dim_node)
return '?'
def _eval_unaryop(self, node):
return f"{node.op}{node.expr.value}" if node.op == '-' else '?'
# 处理typedef
def visit_Typedef(self, node):
if isinstance(node.type.type, c_ast.Struct):
struct_def = node.type.type
typedef_name = node.name
self.typedef_struct_map[struct_def.name] = typedef_name
elif isinstance(node.type.type, c_ast.Enum): # 先检测是否是枚举
pass
elif (isinstance(node.type, c_ast.PtrDecl) and # 处理函数指针
isinstance(node.type.type, c_ast.FuncDecl)):
pass
else:
original_type = self._get_type_name(node.type)
self.typedef_dict[node.name] = original_type
self.generic_visit(node)
def print_visit(self):
typedef_dict = self.typedef_dict
print ('typedef information:')
print ('------------------------------------')
for typedef_key in typedef_dict:
print (" %15s :: %s" % (typedef_key, typedef_dict[typedef_key]))
print ('------------------------------------\n')
if __name__ == '__main__':
file_path = "test_code.h"
ast = parse_file(file_path)
with open('./pycparser.txt', 'w+') as f:
f.write(str(ast))
visitor = c_visitor()
visitor.visit(ast)
visitor.print_visit()
输出:
typedef information:
------------------------------------
uint8_t :: char
uint32_t :: unsigned int
uint8_arr8 :: uint8_t[8]
uint8_arr128 :: uint8_t[128]
uint8_p :: uint8_t*
uint8_pp :: uint8_t**
byte :: unsigned char
ulong :: unsigned long
union_a0 :: union union_a0_u
union_a1 :: union union_a1_u
------------------------------------

浙公网安备 33010602011771号