Day 22：Python 迭代器和生成器小记

迭代器对象

像列表、字典、集合他们可以迭代，但是他们都是非迭代器对象，区别在于

有没有经过内置函数 iter 包装
列表的表头始终指向第一个元素，迭代器遍历结束后，不返回指向原来的位置，而是指向最后一个元素的下一个位置，有人说那不是扯淡吗都最后一个元素了，没毛病，所以就会抛出 StopIteration 异常。

a = [1,2,3,4,5,6,8]

a_iter = iter(a) # a_iter 就是迭代器
from collections.abc import Iterator
print(isinstance(a_iter,Iterator))
# 遍历a 和 a_iter
for _ in a:
    print(_)
print("========================华丽的分割线========================")
for _ in a_iter:
    print(_)
    
print("再次遍历")
for _ in a:
    print(_)
print("========================华丽的分割线========================")
for _ in a_iter:
    print(_)

output:
True
1
2
3
4
5
6
8
========================华丽的分割线========================
1
2
3
4
5
6
8
再次遍历
1
2
3
4
5
6
8
========================华丽的分割线========================

这不是没有抛出StopIteration 吗，其实这样的遍历是不会，而是需要迭代器对象才有的next()内置函数想读取最后一个元素的下一个元素才会：

# 重新建一个a的迭代器，指向a的第一个元素
a_iter_copy = iter(a)
# a_iter_copy.len()  无法通过调用 len 获得迭代器的长度
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))
print(next(a_iter_copy))

output:
1
2
3
4
5
6
8
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-10-52c4d4ddde71> in <module>()
      9 print(next(a_iter_copy))
     10 print(next(a_iter_copy))
---> 11 print(next(a_iter_copy))

StopIteration:

如何捕获迭代器达到尾部抛出的异常（就此也可以获得迭代器长度）：

a = [1,2,3,4,5,6,8]
a_iter_copy2 = iter(a)
iter_len = 0
try:
    while True:
        i = next(a_iter_copy2)
        print(i)
        iter_len += 1
except:
    print('iterator stoped!')
    
print('length of iterator is %d' % (iter_len,))

output:
1
2
3
4
5
6
8
iterator stoped!
length of iterator is 7

顺便说句，-->python中的异常处理<--

内置模块 itertools 中使用生成器的 9 个节省内存的案例

使用生成器（迭代器）来节省内存是通往python高阶使用的必经之路，加油！

带 yield 的函数是生成器，而生成器也是一种迭代器

输出结果需要结合 for 或 next 和捕获 StopIteration。

下面主要说下生成器带来哪些好处，实际的使用场景在哪里：

节省内存

看一个例子，空间复杂度O(n),开辟了[1, 2, 6, 24, 120, 720]长的空间

def accumulate_div(a):
    if a is None or len(a) == 0:
        return []
    rtn = [a[0]] 
    for i in a[1:]:
        rtn.append(i*rtn[-1])
    return rtn

rtn = accumulate_div([1, 2, 3, 4, 5, 6])
print(rtn) 

output:
[1, 2, 6, 24, 120, 720]

使用yield生成器，实现空间复杂度O(1):

def accumulate_div(a):
    if a is None or len(a) == 0:
        return []
    it = iter(a)
    total = next(it)
    yield total
    for i in it:
        total = total * i
        yield total
        
rtn = list(accumulate_div([1, 2, 3, 4, 5, 6]))
print(rtn)

output:
[1, 2, 6, 24, 120, 720]

当输入的数组 [1, 2, 3, 4, 5, 6]，只有 6 个元素时，这种内存浪费可以忽视，但是当处理几个 G 的数据时，这种内存空间的浪费就是致命的，尤其对于单机处理。

Python 内置的 itertools 模块，有许多关于yield的使用方法，多加练习。

拼接迭代器 chain(*iterables)

实现元素拼接，可以使用chain；有点像join()。但是也同，join只能一个串联一个序列对象，而chain能一次串联多个可迭代对象，形成一个大的可迭代对象。

chain函数原型：

chain(*iterables) # * 可变个数的位置参数

例子：

from itertools import *
chain_it = chain(['I','love'],['Flower Dance'],['very', 'much'])
for _ in chain_it:
    print(_)
    
from collections.abc import Iterator
print(isinstance(chain_it,Iterator))

output:
I
love
Flower Dance
very
much

True # 是一个迭代器（Iterator）

chain有没有节省内存？也可以,chain 是一个生成器函数，在迭代时，每次吐出一个元素，所以做到最高效的节省内存

# chain 主要实现代码
def my_chain(*iterables):
    for it in iterables:
        for element in it:
            yield element
            
chain_it = my_chain(['I','love'],['Flower Dance'],['very', 'much',['very', 'much']])
for _ in chain_it:
    print(_)

output:
I
love
Flower Dance
very
much
['very', 'much']

累积迭代器 accumulate(iterable[, func, *, initial=None])

返回可迭代对象的累积迭代器，函数原型如下：

accumulate(iterable[, func, *, initial=None])

返回的是一个迭代器，通过结合 for 打印出来

如果不提供function,那么就默认求和，累积和：

accu_iterator = accumulate([1,2,3,4,5,6])
for _ in accu_iterator:
    print(_)

output:
1
3
6
10
15
21

提供函数：

accu_iterator = accumulate([1,2,3,4,5,6],lambda x: x**2)
for _ in accu_iterator:
    print(_)

output:
1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-e2b3bc560f59> in <module>()
      1 accu_iterator = accumulate([1,2,3,4,5,6],lambda x: x**2)
----> 2 for _ in accu_iterator:
      3     print(_)

TypeError: <lambda>() takes 1 positional argument but 2 were given



# 修正例子lambda x,y: x*y

accu_iterator = accumulate([1,2,3,4,5,6],lambda x,y: x*y)
for _ in accu_iterator:
print(_)

output:

如果 func 提供，func 的参数个数要求为 2，根据 func 的累积行为返回结果。

accumulate 主要的实现代码：

def accumulate(iterable, func=operator.add, *, initial=None):
    it = iter(iterable)
    total = initial
    if initial is None:
        try:
            total = next(it)
        except StopIteration:
            return
    yield total
    for element in it:
        total = func(total, element)
        yield total

包装 iterable 为迭代器
如果它的初始值为 None，迭代器向前移动求出下一个元素，并赋值给 total，然后 yield；如果初始值被赋值，直接 yield
func(total, element) 后，求出 total 的下个取值，yield 后，得到返回结果的下个元素。直到迭代结束
从func也可以看出，func接受参数为2个
理解累积

漏斗迭代器 `compress(data, selectors)`

字面意思，肯定有筛选小的意思？函数原型：

compress(data, selectors)

确实，经过 selectors 过滤后，返回一个更小的迭代器，例子：

compress_iter = compress('abcdefg',[2,1,0,0,'1',-1,1,1,1])
print(isinstance(compress_iter,Iterator))
for _ in compress_iter:
    print(_)

output:
True
a
b
e
f
g

compress 返回元素个数，至多等于两个参数中较短序列的长度。
当selectors有效位置为0时，才会被过滤掉

compress主要实现代码：

def compress(data, selectors):
    return (d for d, s in zip(data, selectors) if s)

drop 迭代器 dropwhile(predicate, iterable)

扫描可迭代对象 iterable，从不满足条件处往后全部保留，返回一个更小的迭代器。函数原型：

dropwhile(predicate, iterable) # dropwhile(条件，对象)

注意，是不满足条件处往后开始（包含这处不满足条件的元素），例子：

drop_iterator = dropwhile(lambda x: x<5,[1,0,2,4,1,1,3,5,-5])
for _ in drop_iterator:
    print(_)

output:
5
-5

dropwhile主要实现代码：

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

如果不满足条件 predicate，yield x，然后跳出第一个for迭代，进入下一个for迭代完 iterable 剩余所有元素。
如果满足条件 predicate，就继续迭代，如果所有都满足，则返回空的迭代器。

take 迭代器 takewhile(predicate, iterable)

扫描列表，只要满足条件就从可迭代对象中返回元素，直到不满足条件为止，函数原型：

takewhile(predicate, iterable) # takewhile(条件，对象（列表）)

一定要是列表吗？nope，支持for in 遍历即可：

take_iterator = takewhile(lambda x: x<5, (1,4,6,4,1))
for _ in take_iterator:
    print(_)

output:
1
4

看主要实现代码也知道：

def takewhile(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break #不满足条件立即返回

克隆迭代器 tee(iterable, n=2)

tee 实现对原迭代器的复制，原型：

tee(iterable, n=2) # tee(目标迭代器，克隆次数)

例子：

a = tee([1,2,3,4,5,6,8],3)
print(isinstance(a,Iterator))
a
print(type(a))
print(next(a[0]))
print(next(a[1]))
iter_len = 0
try:
    while True:
        i = next(a[2])
        iter_len += 1
except:
    print('iterator stoped!')
    
print('length of iterator a[2] is %d' % (iter_len,))

output:
False
<class 'tuple'>
1
1
iterator stoped!
length of iterator a[2] is 7

这种应用场景，需要用到迭代器至少两次的场合，一次迭代器用完后，再使用另一个克隆出的迭代器。
tee返回的是元组，tee的元组才是我们需要的迭代器
克隆出的迭代器之间相互独立

tee主要实现代码（随意看看，会用tee就很不错啦）：

from collections import deque

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:            
                try:
                    newval = next(it)   
                except StopIteration:
                    return
                for d in deques:     
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

复制元素 repeat(object[, times])

repeat 实现复制元素 n 次，原型如下：

repeat(object[, times]) #repeat(任意对象，次数)

例子：

list(repeat([7,7,7],7))

output:
[[7, 7, 7], [7, 7, 7], [7, 7, 7], [7, 7, 7], [7, 7, 7], [7, 7, 7], [7, 7, 7]]

repeat主要实现代码：

def repeat(object, times=None):
    if times is None:
        while True: 
            yield object
    else:
        for i in range(times):
            yield object

笛卡尔积 product(*args, repeat=1)

当repeat参数为1时，等同于实现 ((x,y) for x in A for y in B)

主要实现代码：

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
　　 
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

yield返回为元组
repeat = 1两个集合的所有交叉情况
repeat = 2呢? 扩展repeat次的两个输入中的所有元素也许可能大概maybe是这样

例子：

def my_product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    print(pools)
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
        print(result,'\n')
        
    for prod in result:
        yield tuple(prod)

# 当repeat = 1
list(my_product('AB', 'xy',repeat = 1))

output:
[('A', 'B'), ('x', 'y')]
[['A'], ['B']] 

[['A', 'x'], ['A', 'y'], ['B', 'x'], ['B', 'y']] 

[('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y')]

# 当repeat = 2
list(my_product('AB', 'xy',repeat = 2))

output:
[('A', 'B'), ('x', 'y'), ('A', 'B'), ('x', 'y')]
[['A'], ['B']] 

[['A', 'x'], ['A', 'y'], ['B', 'x'], ['B', 'y']] 

[['A', 'x', 'A'], ['A', 'x', 'B'], ['A', 'y', 'A'], ['A', 'y', 'B'], ['B', 'x', 'A'], ['B', 'x', 'B'], ['B', 'y', 'A'], ['B', 'y', 'B']] 

[['A', 'x', 'A', 'x'], ['A', 'x', 'A', 'y'], ['A', 'x', 'B', 'x'], ['A', 'x', 'B', 'y'], ['A', 'y', 'A', 'x'], ['A', 'y', 'A', 'y'], ['A', 'y', 'B', 'x'], ['A', 'y', 'B', 'y'], ['B', 'x', 'A', 'x'], ['B', 'x', 'A', 'y'], ['B', 'x', 'B', 'x'], ['B', 'x', 'B', 'y'], ['B', 'y', 'A', 'x'], ['B', 'y', 'A', 'y'], ['B', 'y', 'B', 'x'], ['B', 'y', 'B', 'y']] 

[('A', 'x', 'A', 'x'),
 ('A', 'x', 'A', 'y'),
 ('A', 'x', 'B', 'x'),
 ('A', 'x', 'B', 'y'),
 ('A', 'y', 'A', 'x'),
 ('A', 'y', 'A', 'y'),
 ('A', 'y', 'B', 'x'),
 ('A', 'y', 'B', 'y'),
 ('B', 'x', 'A', 'x'),
 ('B', 'x', 'A', 'y'),
 ('B', 'x', 'B', 'x'),
 ('B', 'x', 'B', 'y'),
 ('B', 'y', 'A', 'x'),
 ('B', 'y', 'A', 'y'),
 ('B', 'y', 'B', 'x'),
 ('B', 'y', 'B', 'y')]

加强版 zip函数 zip_longest(*args, fillvalue=None)

若可迭代对象的长度未对齐，将根据 fillvalue 填充缺失值，返回结果的长度等于更长的序列长度。