Cython+Numpy的运算加速（官方Demo）测试

http://docs.cython.org/en/latest/src/tutorial/numpy.html

Cython与NumPy的工作

注意

Cython 0.16引入了类型化的内存视图，作为此处描述的NumPy集成的继承者。它们比下面的缓冲区语法更易于使用，开销较小，并且可以在不需要GIL的情况下进行传递。应优先使用此页面中显示的语法。有关NumPy用户的信息，请参见Cython。

您可以从Cython中使用NumPy，与在常规Python中完全一样，但是这样做会丢失潜在的高速度，因为Cython支持快速访问NumPy数组。让我们用一个简单的例子看看它是如何工作的。

下面的代码使用滤镜对图像进行2D离散卷积（我敢肯定，您可以做得更好！让它用于演示）。它既是有效的Python，又是有效的Cython代码。convolve_py.py对于Python版本和convolve1.pyxCython版本，我都将其称为 – Cython使用“ .pyx”作为其文件后缀。

import numpy as np


def naive_convolve(f, g):
    # f is an image and is indexed by (v, w)
    # g is a filter kernel and is indexed by (s, t),
    #   it needs odd dimensions
    # h is the output image and is indexed by (x, y),
    #   it is not cropped
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    # smid and tmid are number of pixels between the center pixel
    # and the edge, ie for a 5x5 filter they will be 2.
    #
    # The output size is calculated by adding smid, tmid to each
    # side of the dimensions of the input image.
    vmax = f.shape[0]
    wmax = f.shape[1]
    smax = g.shape[0]
    tmax = g.shape[1]
    smid = smax // 2
    tmid = tmax // 2
    xmax = vmax + 2 * smid
    ymax = wmax + 2 * tmid
    # Allocate result image.
    h = np.zeros([xmax, ymax], dtype=f.dtype)
    # Do convolution
    for x in range(xmax):
        for y in range(ymax):
            # Calculate pixel value for h at (x,y). Sum one component
            # for each pixel (s, t) of the filter g.
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

应该对此进行编译以产生yourmod.so（对于Linux系统，在Windows系统上，它将是yourmod.pyd）。我们运行一个Python会话来测试Python版本（从.py-file 导入）和已编译的Cython模块。

In [1]: import numpy as np
In [2]: import convolve_py
In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
...     np.array([[1],[2],[1]], dtype=np.int))
Out [3]:
array([[1, 1, 1],
    [2, 2, 2],
    [1, 1, 1]])
In [4]: import convolve1
In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int),
...     np.array([[1],[2],[1]], dtype=np.int))
Out [4]:
array([[1, 1, 1],
    [2, 2, 2],
    [1, 1, 1]])
In [11]: N = 100
In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g)
2 loops, best of 3: 1.86 s per loop
In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g)
2 loops, best of 3: 1.41 s per loop

还没有太大的区别。因为C代码仍然完全执行Python解释器的操作（例如，这意味着为每个使用的数字分配一个新对象）。查看生成的html文件，看看即使最简单的语句也需要什么，您很快就会明白这一点。我们需要为Cython提供更多信息；我们需要添加类型。

添加类型

要添加类型，我们使用自定义的Cython语法，因此我们现在破坏了Python源兼容性。考虑以下代码（请阅读注释！）：

# tag: numpy
# You can ignore the previous line.
# It's for internal testing of the cython documentation.

import numpy as np

# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np

# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int

# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t

# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)
def naive_convolve(np.ndarray f, np.ndarray g):
    if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
        raise ValueError("Only odd dimensions on filter supported")
    assert f.dtype == DTYPE and g.dtype == DTYPE

    # The "cdef" keyword is also used within functions to type variables. It
    # can only be used at the top indentation level (there are non-trivial
    # problems with allowing them in other places, though we'd love to see
    # good and thought out proposals for it).
    #
    # For the indices, the "int" type is used. This corresponds to a C int,
    # other C types (like "unsigned int") could have been used instead.
    # Purists could use "Py_ssize_t" which is the proper Python type for
    # array indices.
    cdef int vmax = f.shape[0]
    cdef int wmax = f.shape[1]
    cdef int smax = g.shape[0]
    cdef int tmax = g.shape[1]
    cdef int smid = smax // 2
    cdef int tmid = tmax // 2
    cdef int xmax = vmax + 2 * smid
    cdef int ymax = wmax + 2 * tmid
    cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE)
    cdef int x, y, s, t, v, w

    # It is very important to type ALL your variables. You do not get any
    # warnings if not, only much slower code (they are implicitly typed as
    # Python objects).
    cdef int s_from, s_to, t_from, t_to

    # For the value variable, we want to use the same data type as is
    # stored in the array, so we use "DTYPE_t" as defined above.
    # NB! An important side-effect of this is that if "value" overflows its
    # datatype size, it will simply wrap around like in C, rather than raise
    # an error like in Python.
    cdef DTYPE_t value
    for x in range(xmax):
        for y in range(ymax):
            s_from = max(smid - x, -smid)
            s_to = min((xmax - x) - smid, smid + 1)
            t_from = max(tmid - y, -tmid)
            t_to = min((ymax - y) - tmid, tmid + 1)
            value = 0
            for s in range(s_from, s_to):
                for t in range(t_from, t_to):
                    v = x - smid + s
                    w = y - tmid + t
                    value += g[smid - s, tmid - t] * f[v, w]
            h[x, y] = value
    return h

建立此基础并继续执行我的（非常非正式的）基准测试后，我得到：

In [21]: import convolve2
In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g)
2 loops, best of 3: 828 ms per loop

高效索引

仍然存在瓶颈，导致性能下降，那就是阵列查找和分配。本[]-运算符仍然使用完整的Python操作-也就是我们想要做的反而是访问数据直接在C速度缓冲。

然后，我们需要输入ndarray对象的内容。我们使用特殊的“缓冲区”语法来做到这一点，必须告知数据类型（第一个参数）和维数（“ ndim”仅关键字参数，如果未提供，则假定为一维）。

这些是所需的更改：

...
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
...
cdef np.ndarray[DTYPE_t, ndim=2] h = ...

用法：

In [18]: import convolve3
In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g)
3 loops, best of 100: 11.6 ms per loop

请注意此更改的重要性。

陷阱：这种高效的索引编制仅影响某些索引操作，即具有确切ndim数量的类型化整数索引的索引操作。因此v，例如，如果未键入，则查询不会得到优化。另一方面，这意味着您可以继续使用Python对象进行复杂的动态切片等，就像未键入数组时一样。f[v, w]

进一步优化索引

阵列查找仍然因两个因素而变慢：

进行边界检查。

负索引已检查并正确处理。上面的代码经过显式编码，因此它不使用负索引，并且（希望）始终在范围内进行访问。我们可以添加一个装饰器来禁用边界检查：

...
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g):
...

现在不执行边界检查（并且，副作用是，如果您“做”碰巧超出了边界，则在最佳情况下将使程序崩溃，在最坏的情况下将破坏数据）。可以通过多种方式切换边界检查模式，有关更多信息，请参见Compiler指令。

此外，我们已禁用检查以包装负索引（例如，g [-1]给出最后一个值）。与禁用边界检查一样，如果我们尝试在禁用此功能的情况下实际使用负索引，则会发生不好的事情。

现在，函数调用开销开始起作用，因此我们将后两个示例与较大的N进行比较：

In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g)
3 loops, best of 100: 5.97 ms per loop
In [12]: N = 1000
In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N))
In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9))
In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g)
1 loops, best of 10: 1.16 s per loop
In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g)
1 loops, best of 10: 597 ms per loop

（这也是混合基准，因为结果数组在函数调用中分配。）

警告

速度要付出一些代价。尤其是它可能是危险的设置类型的对象（如f，g并h在我们的示例代码） None。将此类对象设置None为完全合法，但是您只能使用它们检查是否为无。所有其他用途（属性查找或索引编制）都可能会造成段错误或数据损坏（而不是像Python中那样引发异常）。

实际的规则稍微复杂一些，但是主要的信息很明确：不要在不知道类型对象未设置为None的情况下使用它们。

更通用的代码

可以这样做：

def naive_convolve(object[DTYPE_t, ndim=2] f, ...):

即使用object而不是np.ndarray。在Python 3.0中，这可以允许您的算法与支持缓冲区接口的任何库一起使用；如果有人也对Python 2.x感兴趣，可以轻松添加对Python图像库的支持，例如。

但是，这样做会有一些速度上的损失（如果将类型设置为np.ndarray，则会有更多的假设是编译时的，特别是假设数据是以纯跨步模式存储的，而不是以间接模式存储的）

----

但是通过我的代码的实验，在单次运算量不大，并且需要多次新建释放数据的时候，加速效果并没有体现出来：

- list的运算效率大于np.array

- python 和cython的运行效率，在代码一致的情况下，没有提升

- 反正上面的走了一遭，性能提升0；（不适合，留待后续研究）

posted @ 2020-07-29 21:47 戒骄戒躁-沉淀积蓄阅读(2119) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

只言片语

Cython+Numpy的运算加速（官方Demo）测试

Cython与NumPy的工作

添加类型

高效索引

进一步优化索引

更通用的代码

公告

只言片语

Cython+Numpy的运算加速 （官方Demo）测试

Cython与NumPy的工作

添加类型

高效索引

进一步优化索引

更通用的代码

公告

Cython+Numpy的运算加速（官方Demo）测试