Python与C交互之指针，一篇文章搞懂内核编程

最近，在研究免杀，先用python写，然后再转为C，过程中踩坑不少，涉及到内核编程，看到一篇很不错的文章，特此记录一下。
原文：https://blog.csdn.net/Kelvin_Yan/article/details/86546784

指针类型

通过 POINTER(ctypes type)定义指针类型

T_int_ptr = POINTER(c_int)

等价于C的

typedef int* T_int_ptr

ctypes自带的指针类型有

ctypes类型	C类型	python类型
c_char_p	char * (NUL terminated)	bytes object or None
c_wchar_p	wchar_t * (NUL terminated)	string or None
c_void_p	void *	int or None

其它类型只能通过POINTER定义，包括我们的自定义类型（如结构体）

某些时候，ctypes可以在python类型与C类型间自动转换
（1）如果函数的参数定义为POINTER(type)，那调用函数时可以直接输入type，会自动执行byref

libc.myfunc.argtypes = [POINTER(c_int)]   
i = c_int(32)
libc.myfunc(i)     #方式1
libc.myfunc(byref(i))  #方式2

方式1等价于方式2，跟C++的形参引用一样，使用时输入变量本身

void myfunc(int &i)
{
  i = 0;
}
void main()
{
  int i = 32;
  myfunc(i);
}

原文在Type conversions一节

In addition, if a function argument is explicitly declared to be a pointer type (such as POINTER(c_int)) in argtypes, 
an object of the pointed type (c_int in this case) can be passed to the function. ctypes will apply the required byref() 
conversion in this case automatically.

（2）几种python类型可以自动转换为C类型，包括None, integers, bytes objects and (unicode) strings，也就不需要事先转换为ctypes类型了

原文在Calling functions一节

None, integers, bytes objects and (unicode) strings are the only native Python objects that can directly be used as 
parameters in these function calls. None is passed as a C NULL pointer, bytes objects and strings are passed as 
pointer to the memory block that contains their data (char * or wchar_t *). Python integers are passed as the 
platforms default C int type, their value is masked to fit into the C type.

指针对象

通过pointer(object)取一个对象的指针

i = c_int(42)
pi = pointer(i)

pi称为一个指针对象（也是一个对象！），它本身的值并非所指向对象的内存地址，而C中指针变量的值就是所指向的内存地址

pi = pointer(i)   
pi    # <ctypes.wintypes.LP_c_long at 0x8b6bb48> 这是对象pi的地址，并非i的地址；

访问指针第n个元素

val = pi[0]   #通过下标读
pi[0] = c_int(0)    #通过下标写

下标支持负数，指针对象没有长度限制，所以千万注意不要越界访问！

关于contents属性

pi1 = pi.contents    #通过contents获得指针所指的内容

注意，contents返回的是一个新的对象，并非原对象本身

pi.contents is i   #返回False
pi.contents is pi.contents   #返回False

所以，向contents赋值也不会修改原对象的内容，而是将指针指向了新的对象

引用

通过byref取一个对象的引用。对象必须是ctypes类型

i = c_int(42)
ri = byref(i)

等价于C的

(char *)&obj

跟pointer一样，引用也是一个对象，拥有自己的地址

ri = byref(i)   
ri    # <cparam 'P' (0000000008B6BB10)> 这是对象ri的地址，并非i的地址

数组

ctypes的Array

The recommended way to create concrete array types is by multiplying any ctypes data type with a positive 
integer. Alternatively, you can subclass this type and define _length_ and _type_ class variables. Array elements 
can be read and written using standard subscript and slice accesses; for slice reads, the resulting object is not 
itself an Array.

定义一个数组的两种方式：
（1）定义数组类型

from ctypes import *

#数值数组
TenIntegers = c_int * 10    #TenIntegers 是一个类型，代表10个int的数组
iarr = TenIntegers(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

#字符串数组
T_char_arr = c_char * 12   # ctypes.c_char_Array_12
carr = T_char_arr(0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0x77, 0x6F, 0x72, 0x6C,0x64, 0x00)  
ctypes.string_at(byref(ra))

（2）从列表构造

pyarray = [1,2,3,4,5,6,7,8,9,10]
carray = (ctypes.c_int*len(pyarray))(*pyarray)

本质上还是通过定义数组类型，以上可以分解为两步

arrtype = ctypes.c_int*len(pyarray)     #1
arr = arrtype (*pyarray )  #2

注意：其中的#1的星号代表乘号，而#2的星号代表从pyarray 逐个元素取出

数组的访问方式，下标或遍历

carray[0]      #下标读
carray[0]  = 10    #下标写
for i in ii: print(i, end=" ")    #遍历

C中数组名就是首地址指针，其实ctypes.Array也一样，传递数组对象就是传递指针，可以实现in-place操作

libc.myfunc.argtypes = [POINTER(c_int), c_int]   #C动态库函数，myfunc(int* arr, int len)，修改传入数组的值
libc.myfunc(carray, 10)     #函数返回后，carray的值将被修改

空指针

通过定义类型得到空指针

null_ptr = POINTER(c_int)()
null_ptr   # <ctypes.wintypes.LP_c_long at 0x8b6bdc8>，空指针也是一个指针对象，也存在其地址
null_ptr[0]  # ValueError: NULL pointer access， 由于指向为空抛出异常，python会自行检测
null_ptr[0] = c_int(1)    # ValueError: NULL pointer access
null_ptr.contents    # ValueError: NULL pointer access
null_ptr.contents  = c_int(1)   # 这里可以正常运行，因为给contents属性赋值改变了指针的指向，指向了有意义的地址
null_ptr[0] = c_int(2)  # 上面的1被修改为2

另外，使用None会自动转换为空指针

申请内存

python自带垃圾回收，没有类似C++的new/delete。硬是找到有一个ctypes.create_string_buffer
该函数本意是用于bytes object的字符串的（当然还有unicode版本的create_unicode_buffer）

mstr = 'Hello world'
buf = ctypes.create_string_buffer(mstr.encode('ascii'))   # <ctypes.c_char_Array_12 at 0x8b6bc48>   长度为12的c_char数组
ctypes.string_at( byref(buf))    # b'Hello world'

也可以单纯用来作为一个缓冲区

mytype = c_int
pyarray = [1,2,3,4,5,6,7,8,9,10]
carray = (mytype*len(pyarray))(*pyarray)    #源数据
count = 10
bufsz = count*sizeof(mytype)
buf = ctypes.create_string_buffer(bufsz)   #创建缓冲区
ctypes.memmove(byref(buf), carray , bufsz)  #往缓冲区拷贝数据
res = ctypes.cast(buf, POINTER(mytype))   #转换为所需要的指针类型

注意到这里有一个函数ctypes.memmove，直接就是C的memcpy，类似的函数还有
ctypes.memset、ctypes.sizeof（希望官方能开放更多的C函数方便使用）

强制类型转换

这个函数显然是为了C专门准备的

ctypes.cast(obj, type)
This function is similar to the cast operator in C. It returns a new instance of type which points to the same memory block as obj.
 type must be a pointer type, and obj must be an object that can be interpreted as a pointer.

注意，只能用于指针对象的转换
有了cast，就可以用void * 来传递任意的类型指针

libc.myfunc.argtypes = [c_void_p, c_int]    #C动态库函数，myfunc(void* str, int len)   
buf = ctypes.create_string_buffer(256)   #字符串缓冲区
void_ptr = ctypes.cast(buf,c_void_p)
libc.myfunc(void_ptr,256)   #在myfunc内填充字符串缓冲区
char_ptr = ctypes.cast(void_ptr, POINTER(c_char))

函数指针

ctypes下给出了三种函数类型的定义方法

ctypes.CFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False)
The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call. If use_errno is set to true, the ctypes private copy of the system errno variable is exchanged with the real errno value before and after the call; use_last_error does the same for the Windows error code.

ctypes.WINFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False)
Windows only: The returned function prototype creates functions that use the stdcall calling convention, except on Windows CE where WINFUNCTYPE() is the same as CFUNCTYPE(). The function will release the GIL during the call. use_errno and use_last_error have the same meaning as above.

ctypes.PYFUNCTYPE(restype, *argtypes)
The returned function prototype creates functions that use the Python calling convention. The function will not release the GIL during the call.

第一个参数restype代表返回值，后面的依次为每个形参
调用约束：WINFUNCTYPE代表stdcall，CFUNCTYPE代表cdecl

主要用在定义C的回调函数

#python定义回调函数
def py_callback_func(data):   #通过回调函数返回一个浮点数
    print('callback : '+str(data))
    return

PyCallbackFunc = WINFUNCTYPE(None,c_float)      #定义函数类型
libc.funcWithCallback(PyCallbackFunc(py_callback_func))      #C库函数  void funcWithCallback(callback func)

numpy相关

[官方文档][Link 1]ctypes一节有一些说明

对于大数据、多维矩阵，不适合用ctypes.create_string_buffer的方式，此时可以用numpy的接口进行指针操作，两种方式

（1）numpy.ndarray.ctypes.data_as方法

import numpy as np
x = np.zeros((10,10),np.float32)   # 定义一个10*10的二维矩阵，类型为float
cptr =  x.ctypes.data_as(POINTER(ctypes.c_float))    #转为C类型指针
libc.myfunc(cptr, 10, 10)   #C库函数  void myfunc(float* matrix, int rows, int cols)

(2)numpy.ctypeslib.ndpointer方法

numpy.ctypeslib.ndpointer(dtype=None, ndim=None, shape=None, flags=None)[source]

Array-checking restype/argtypes.
An ndpointer instance is used to describe an ndarray in restypes and argtypes specifications. This approach is 
more flexible than using, for example, POINTER(c_double), since several restrictions can be specified, which are 
verified upon calling the ctypes function. These include data type, number of dimensions, shape and flags. If a 
given array does not satisfy the specified restrictions, a TypeError is raised.

这种方式的好处是ndpointer会检查输入数据是否匹配指定的类型、维数、形状和标志（当然也可以不指定）

Parameters: 
dtype : data-type, optional
Array data-type.

ndim : int, optional
Number of array dimensions.

shape : tuple of ints, optional
Array shape.

flags : str or tuple of str
Array flags; may be one or more of:
C_CONTIGUOUS / C / CONTIGUOUS
F_CONTIGUOUS / F / FORTRAN
OWNDATA / O
WRITEABLE / W
ALIGNED / A
WRITEBACKIFCOPY / X
UPDATEIFCOPY / U

例子

import numpy as np
x = np.zeros((10,10),np.float32)   # 定义一个10*10的二维矩阵，类型为float
libc.myfunc.argtypes = [ndpointer(ctypes.c_float), ctypes.c_int,  ctypes.c_int]
libc.myfunc.restype = None
libc.myfunc(x, 10, 10)   #C库函数  void myfunc(float* matrix, int rows, int cols)

效率

考虑开销

create_string_buffer的开销
（待续）

posted @ 2022-03-17 21:37 komomon 阅读(2557) 评论(0) 收藏举报

刷新页面返回顶部

komomon