python 进阶篇迭代器和生成器深入理解

列表/元组/字典/集合都是容器。对于容器，可以很直观地想象成多个元素在一起的单元；而不同容器的区别，正是在于内部数据结构的实现方法。

所有的容器都是可迭代的（iterable）。另外字符串也可以被迭代。

迭代器类比

迭代可以想象成是你去买苹果，卖家并不告诉你他有多少库存。这样，每次你都需要告诉卖家，你要一个苹果，然后卖家采取行为：要么给你拿一个苹果；要么告诉你，苹果已经卖完了。你并不需要知道，卖家在仓库是怎么摆放苹果的。

严谨地说，迭代器（iterator）提供了一个 next（可以不重复不遗漏地一个一个拿到所有元素）的方法。调用这个方法后，你要么得到这个容器的下一个对象，要么得到一个 StopIteration 的错误（苹果卖完了）。

示例，判断是否可迭代

from collections.abc import Iterable

params = [
    1234,
    '1234',
    [1, 2, 3, 4],
    set([1, 2, 3, 4]),
    {1:1, 2:2, 3:3, 4:4},
    (1, 2, 3, 4)
]
    
for param in params:
    print('{} is iterable? {}'.format(param, isinstance(param, Iterable)))
    
# 输出
# 1234 is iterable? False
# 1234 is iterable? True
# [1, 2, 3, 4] is iterable? True
# {1, 2, 3, 4} is iterable? True
# {1: 1, 2: 2, 3: 3, 4: 4} is iterable? True
# (1, 2, 3, 4) is iterable? True

生成器类比

生成器可以想象成是你去买苹果，卖家并没有库存。这样，每次你都需要告诉卖家，你要一个苹果，然后卖家采取行为，立马生成 1 个苹果（生成速度极快）：要么给你拿一个苹果；要么告诉你，苹果已经卖完了。

生成器是懒人版本的迭代器

示例，迭代器与生成器的对比

import os
import psutil
import time
import functools

def log_execution_time(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        res = func(*args, **kwargs)
        end = time.perf_counter()
        print('{} took {} ms'.format(func.__name__, (end - start) * 1000))
        return res
    return wrapper
    

# 显示当前 python 程序占用的内存大小
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)
    
    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memory used: {} MB'.format(hint, memory))

@log_execution_time
def test_iterator():
    show_memory_info('initing iterator')
    list_1 = [i for i in range(100000000)]
    show_memory_info('after iterator initiated')
    print(sum(list_1))
    show_memory_info('after sum called')
    
@log_execution_time
def test_generator():
    show_memory_info('initing generator')
    list_2 = (i for i in range(100000000))
    show_memory_info('after generator initiated')
    print(sum(list_2))
    show_memory_info('after sum called')

test_iterator()
print()
test_generator()

########## 输出 ##########
# initing iterator memory used: 10.16796875 MB
# after iterator initiated memory used: 3664.34765625 MB
# 4999999950000000
# after sum called memory used: 3664.34765625 MB
# test_iterator took 6179.794754018076 ms

# initing generator memory used: 19.140625 MB
# after generator initiated memory used: 19.14453125 MB
# 4999999950000000
# after sum called memory used: 19.171875 MB
# test_generator took 4912.561981996987 ms

迭代器是一个有限集合，生成器则可以成为一个无限集。

我们并不需要在内存中同时保存这么多东西，比如对元素求和，我们只需要知道每个元素在相加的那一刻是多少就行了，用完就可以扔掉了。

于是，生成器的概念应运而生，在你调用 next() 函数的时候，才会生成下一个变量。生成器在 Python 的写法是用小括号括起来，(i for i in range(100000000))，即初始化了一个生成器。

这样一来，你可以清晰地看到，生成器并不会像迭代器一样占用大量内存，只有在被使用的时候才会调用。而且生成器在初始化的时候，并不需要运行一次生成操作，相比于 test_iterator() ，test_generator() 函数节省了一次生成一亿个元素的过程，因此耗时明显比迭代器短。

示例，数学中有一个恒等式，(1 + 2 + 3 + ... + n)^2 = 1^3 + 2^3 + 3^3 + ... + n^3 的证明

def generator(k):
    i = 1
    while True:
        yield i ** k
        i += 1

gen_1 = generator(1)
gen_3 = generator(3)
print(gen_1)
print(gen_3)

def get_sum(n):
    sum_1, sum_3 = 0, 0
    for i in range(n):
        next_1 = next(gen_1)
        next_3 = next(gen_3)
        print('next_1 = {}, next_3 = {}'.format(next_1, next_3))
        sum_1 += next_1
        sum_3 += next_3
    print(sum_1 * sum_1, sum_3)

get_sum(8)

########## 输出 ##########
# <generator object generator at 0x10c30d3d0>
# <generator object generator at 0x10c6d61d0>
# next_1 = 1, next_3 = 1
# next_1 = 2, next_3 = 8
# next_1 = 3, next_3 = 27
# next_1 = 4, next_3 = 64
# next_1 = 5, next_3 = 125
# next_1 = 6, next_3 = 216
# next_1 = 7, next_3 = 343
# next_1 = 8, next_3 = 512
# 1296 1296

接下来的 yield 是魔术的关键。对于初学者来说，你可以理解为，函数运行到这一行的时候，程序会从这里暂停，然后跳出，不过跳到哪里呢？答案是 next() 函数。那么 i ** k 是干什么的呢？它其实成了 next() 函数的返回值。这样，每次 next(gen) 函数被调用的时候，暂停的程序就又复活了，从 yield 这里向下继续执行；同时注意，局部变量 i 并没有被清除掉，而是会继续累加。我们可以看到 next_1 从 1 变到 8，next_3 从 1 变到 512。

示例，给定两个序列，判定第一个是不是第二个的子序列。

LeetCode 链接如下：https://leetcode.com/problems/is-subsequence/

先来解读一下这个问题本身。序列就是列表，子序列则指的是，一个列表的元素在第二个列表中都按顺序出现，但是并不必挨在一起。举个例子，[1, 3, 5] 是 [1, 2, 3, 4, 5] 的子序列，[1, 4, 3] 则不是。

def is_subsequence(ls, sub):
    ls = iter(ls)
    return all(i in ls for i in sub)

print(is_subsequence([1, 2, 3, 4, 5],[1, 3, 5]))
print(is_subsequence([1, 2, 3, 4, 5],[1, 4, 3]))

########## 输出 ##########

# True
# False

posted @ 2020-04-04 22:23 hiyang 阅读(501) 评论(0) 收藏举报

刷新页面返回顶部

python 进阶篇 迭代器和生成器深入理解

迭代器类比

生成器类比

生成器是懒人版本的迭代器

公告

python 进阶篇迭代器和生成器深入理解