Python的generator
# 前言
这里转帖、翻译一篇关于Python的generator的文章,因为感觉这篇笔记讲的比较清楚,并且记录一些学习的笔记。
# 正文
Python Generators
In this article, you'll learn how to create iterations easily using Python generators, how is it different from iterators and normal functions, and why you should use it.
What are generators in Python?
There is a lot of overhead in building an iterator in Python; we have to implement a class with __iter__() and __next__() method, keep track of internal states, raise StopIterationwhen there was no values to be returned etc.
想要实现一个Python的迭代器(itrerator):
1. 需要实现__iter__() 和__next__() 方法。
2. 跟踪内部状态。
3. 在没有值可以返回时抛出 StopIteration 异常。
This is both lengthy and counter intuitive. Generator comes into rescue in such situations.
Python generators are a simple way of creating iterators. All the overhead we mentioned above are automatically handled by generators in Python.
Generator简化了上面的过程,它接管了很多事情。
Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
简而言之,一个generator是一个函数,返回一个对象(迭代器),我们可以对其进行迭代。
How to create a generator in Python?
It is fairly simple to create a generator in Python. It is as easy as defining a normal function with yield statement instead of a return statement.
创建generator的方法很简单,就是定义一个普通函数,但是使用 yield 语句来代替 return 语句。
If a function contains at least one yield statement (it may contain other yield or returnstatements), it becomes a generator function. Both yield and return will return some value from a function.
The difference is that, while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls.
区别是,一个 return 语句完全终止了一个函数,而 yield 语句暂停了函数,保留着所有的状态,并且将来在后续的调用下可以继续执行。
Differences between Generator function and a Normal function
Here is how a generator function differs from a normal function.
- Generator function contains one or more
yieldstatement. - When called, it returns an object (iterator) but does not start execution immediately.
- Methods like
__iter__()and__next__()are implemented automatically. So we can iterate through the items usingnext(). - Once the function yields, the function is paused and the control is transferred to the caller.
- Local variables and their states are remembered between successive calls.
- Finally, when the function terminates,
StopIterationis raised automatically on further calls.
Here is an example to illustrate all of the points stated above. We have a generator function named my_gen() with several yield statements.
# A simple generator function
def my_gen():
n = 1
print('This is printed first')
# Generator function contains yield statements
yield nn += 1
print('This is printed second')
yield nn += 1
print('This is printed at last')
yield n
An interactive run in the interpreter is given below. Run these in the Python shell to see the output.
>>> # It returns an object but does not start execution immediately.
>>> a = my_gen()
>>> # We can iterate through the items using next().
>>> next(a)
This is printed first
1
>>> # Once the function yields, the function is paused and the control is transferred to the caller.
>>> # Local variables and theirs states are remembered between successive calls.
>>> next(a)
This is printed second
2
>>> next(a)
This is printed at last
3
>>> # Finally, when the function terminates, StopIteration is raised automatically on further calls.
>>> next(a)
Traceback (most recent call last):
...
StopIteration
>>> next(a)
Traceback (most recent call last):
...
StopIteration
One interesting thing to note in the above example is that, the value of variable n is remembered between each call.
Unlike normal functions, the local variables are not destroyed when the function yields. Furthermore, the generator object can be iterated only once.
To restart the process we need to create another generator object using something like a = my_gen().
Note: One final thing to note is that we can use generators with for loops directly.
This is because, a for loop takes an iterator and iterates over it using next() function. It automatically ends when StopIteration is raised. Check here to know how a for loop is actually implemented in Python.
我们也可以用循环来调用,这里可以去理解一下Python的循环的实现原理。
# A simple generator function
def my_gen():
n = 1
print('This is printed first')
# Generator function contains yield statements
yield nn += 1
print('This is printed second')
yield nn += 1
print('This is printed at last')
yield n# Using for loop
for item in my_gen():
print(item)
When you run the program, the output will be:
This is printed first
1
This is printed second
2
This is printed at last
3
Python Generators with a Loop
The above example is of less use and we studied it just to get an idea of what was happening in the background.
Normally, generator functions are implemented with a loop having a suitable terminating condition.
Let's take an example of a generator that reverses a string.
def rev_str(my_str):
length = len(my_str)
for i in range(length - 1,-1,-1):
yield my_str[i]# For loop to reverse the string
# Output:
# o
# l
# l
# e
# h
for char in rev_str("hello"):
print(char)
In this example, we use range() function to get the index in reverse order using the for loop.
It turns out that this generator function not only works with string, but also with other kind of iterables like list, tuple etc.
Python Generator Expression
Simple generators can be easily created on the fly using generator expressions. It makes building generators easy.
Same as lambda function creates an anonymous function, generator expression creates an anonymous generator function.
The syntax for generator expression is similar to that of a list comprehension in Python. But the square brackets are replaced with round parentheses.
The major difference between a list comprehension and a generator expression is that while list comprehension produces the entire list, generator expression produces one item at a time.
They are kind of lazy, producing items only when asked for. For this reason, a generator expression is much more memory efficient than an equivalent list comprehension.
# Initialize the list
my_list = [1, 3, 6, 10]# square each term using list comprehension
# Output: [1, 9, 36, 100]
[x**2 for x in my_list]# same thing can be done using generator expression
# Output: <generator object <genexpr> at 0x0000000002EBDAF8>
(x**2 for x in my_list)
We can see above that the generator expression did not produce the required result immediately. Instead, it returned a generator object with produces items on demand.
# Intialize the list
my_list = [1, 3, 6, 10]a = (x**2 for x in my_list)
# Output: 1
print(next(a))# Output: 9
print(next(a))# Output: 36
print(next(a))# Output: 100
print(next(a))# Output: StopIteration
next(a)
Generator expression can be used inside functions. When used in such a way, the round parentheses can be dropped.
>>> sum(x**2 for x in my_list)
146
>>> max(x**2 for x in my_list)
100
Why generators are used in Python?
There are several reasons which make generators an attractive implementation to go for.
使用generator的原因。
1. Easy to Implement
容易实现。
Generators can be implemented in a clear and concise way as compared to their iterator class counterpart. Following is an example to implement a sequence of power of 2's using iterator class.
class PowTwo:
def __init__(self, max = 0):
self.max = max
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n > self.max:
raise StopIteration
result = 2 ** self.n
self.n += 1
return result
This was lengthy. Now let‘s do the same using a generator function.
def PowTwoGen(max = 0):
n = 0
while n < max:
yield 2 ** n
n += 1
Since, generators keep track of details automatically, it was concise and much cleaner in implementation.
2. Memory Efficient
内存效率高。
A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large.
Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time.
3. Represent Infinite Stream
可以代表无限的流。
Generators are excellent medium to represent an infinite stream of data. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data.
The following example can generate all the even numbers (at least in theory).
def all_even():
n = 0
while True:
yield n
n += 2
4. Pipelining Generators
管道生成器。
Generators can be used to pipeline a series of operations. This is best illustrated using an example.
Suppose we have a log file from a famous fast food chain. The log file has a column (4th column) that keeps track of the number of pizza sold every hour and we want to sum it to find the total pizzas sold in 5 years.
Assume everything is in string and numbers that are not available are marked as 'N/A'. A generator implementation of this could be as follows.
with open('sells.log') as file:
pizza_col = (line[3] for line in file)
per_hour = (int(x) for x in pizza_col if x != 'N/A')
print("Total pizzas sold = ",sum(per_hour))
This pipelining is efficient and easy to read (and yes, a lot cooler!).
参考:
posted on 2019-12-15 17:05 chaiyu2002 阅读(107) 评论(0) 收藏 举报
浙公网安备 33010602011771号