Python-数据结构

本文主要是对Python2.7.5官方文档的翻译.

1. 把列表当做栈来使用

我们可以使用列表的append和pop两个方法来构造一个栈:

 1 >>> stack = [3, 4, 5]
 2 >>> stack.append(6)
 3 >>> stack.append(7)
 4 >>> stack
 5 [3, 4, 5, 6, 7]
 6 >>> stack.pop()
 7 7
 8 >>> stack
 9 [3, 4, 5, 6]
10 >>> stack.pop()
11 6
12 >>> stack.pop()
13 5
14 >>> stack
15 [3, 4]

2. 把列表当做队列来使用

队列是先进先出, 在Python中使用列表来实现队列不是很方便, 可以直接使用deque

 1 >>> from collections import deque
 2 >>> queue = deque(["Eric", "John", "Michael"])
 3 >>> queue.append("Terry")           # Terry arrives
 4 >>> queue.append("Graham")          # Graham arrives
 5 >>> queue.popleft()                 # The first to arrive now leaves
 6 'Eric'
 7 >>> queue.popleft()                 # The second to arrive now leaves
 8 'John'
 9 >>> queue                           # Remaining queue in order of arrival
10 deque(['Michael', 'Terry', 'Graham'])

3. 函数式编程工具

3.1 filter

filter(function, sequence)会返回sequence中所有function(item)是True的item构成的sequence:

1 >>> def f(x): return x % 2 != 0 and x % 3 != 0
2 ...
3 >>> filter(f, range(2, 25))
4 [5, 7, 11, 13, 17, 19, 23]

3.2 map

map(function, sequence)会对sequence中的每一个item都调用function(item), 并且返回由function(item)构成的列表:

1 >>> def cube(x): return x*x*x
2 ...
3 >>> map(cube, range(1, 11))
4 [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

map还可以接受多个sequence作为参数, 为此, function可以接受的参数的个数必须等于sequence的个数:

1 >>> seq = range(8)
2 >>> def add(x, y): return x+y
3 ...
4 >>> map(add, seq, seq)
5 [0, 2, 4, 6, 8, 10, 12, 14]

3.3 reduce

reduce(function, sequence)会对sequence中的前两个元素执行二元函数function(sequence[0], sequence[1]), 然后对该结果和sequence的下一个item再次执行该function, 一直到最后一个item.

1 >>> def add(x,y): return x+y
2 ...
3 >>> reduce(add, range(1, 11))
4 55

4. 列表推导（list comprehension）

列表推导是一种简洁的创建列表的方法.

假设我们需要创建如下列表:

1 >>> squares = []
2 >>> for x in range(10):
3 ...     squares.append(x**2)
4 ...
5 >>> squares
6 [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

我们可以使用列表推导来创建该列表, 而且简洁很多

1 squares = [x**2 for x in range(10)]

另外一种方法是使用map和lambda:

1 squares = map(lambda x: x**2, range(10))

我们可以在列表推导当中使用多个for:

1 >>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
2 [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

上述代码等同于:

1 >>> combs = []
2 >>> for x in [1,2,3]:
3 ...     for y in [3,1,4]:
4 ...         if x != y:
5 ...             combs.append((x, y))
6 ...
7 >>> combs
8 [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

5. 集合(set)

集合是一种无序的, 不包含重复元素的数据结构, 可以通过set()来创建, 并且可以执行交, 合, 差等集合操作

 1 >>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
 2 >>> fruit = set(basket)               # create a set without duplicates
 3 >>> fruit
 4 set(['orange', 'pear', 'apple', 'banana'])
 5 >>> 'orange' in fruit                 # fast membership testing
 6 True
 7 >>> 'crabgrass' in fruit
 8 False
 9 
10 >>> # Demonstrate set operations on unique letters from two words
11 ...
12 >>> a = set('abracadabra')
13 >>> b = set('alacazam')
14 >>> a                                  # unique letters in a
15 set(['a', 'r', 'b', 'c', 'd'])
16 >>> a - b                              # letters in a but not in b
17 set(['r', 'd', 'b'])
18 >>> a | b                              # letters in either a or b
19 set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
20 >>> a & b                              # letters in both a and b
21 set(['a', 'c'])
22 >>> a ^ b                              # letters in a or b but not both
23 set(['r', 'd', 'b', 'm', 'z', 'l'])

6. 循环技巧

对sequence使用enumerate函数可以同时得到每个元素的索引和值

1 >>> for i, v in enumerate(['tic', 'tac', 'toe']):
2 ...     print i, v
3 ...
4 0 tic
5 1 tac
6 2 toe

可以使用zip来同时遍历两个sequence

1 >>> questions = ['name', 'quest', 'favorite color']
2 >>> answers = ['lancelot', 'the holy grail', 'blue']
3 >>> for q, a in zip(questions, answers):
4 ...     print 'What is your {0}?  It is {1}.'.format(q, a)
5 ...
6 What is your name?  It is lancelot.
7 What is your quest?  It is the holy grail.
8 What is your favorite color?  It is blue.

遍历字典可以使用iteritems方法

1 >>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
2 >>> for k, v in knights.iteritems():
3 ...     print k, v
4 ...
5 gallahad the pure
6 robin the brave

在遍历sequence的同时修改sequence时, 最好是在该sequence的一个副本上进行循环

1 >>> words = ['cat', 'window', 'defenestrate']
2 >>> for w in words[:]:  # Loop over a slice copy of the entire list.
3 ...     if len(w) > 6:
4 ...         words.insert(0, w)
5 ...
6 >>> words
7 ['defenestrate', 'cat', 'window', 'defenestrate']

7. 排序

Python的排序是稳定的, 默认是升序排序.

原地排序可以使用sort方法:

1 >>> L=['apple','orange','banana']
2 >>> L.sort()
3 >>> L
4 ['apple', 'banana', 'orange']

得到一个排好序的副本可以使用sorted函数:

1 >>> L=['apple','orange','banana']
2 >>> s=sorted(L)
3 >>> s
4 ['apple', 'banana', 'orange']
5 >>> L
6 ['apple', 'orange', 'banana']

Python通过使用内置的cmp函数比较两个元素来达到排序的目的, 我们可以自己实现一个新的函数, 返回值为-1时代表“小于”, 为0时代表“等于”, 为正数时代表“大于”.

1 def compare(a, b):
2     return cmp(int(a), int(b)) # compare as integers
3 L.sort(compare)
4 
5 def compare_columns(a, b):
6     # sort on ascending index 0, descending index 2
7     return cmp(a[0], b[0]) or cmp(b[2], a[2])
8  out = sorted(L, compare_columns)

8. 列表的性能

1). 列表对象存储的是到对象的指针, 而非对象本身, 所以内存中列表的大小是由列表的元素的个数决定的, 而非元素的大小.

2). 得到或者修改列表中的某个元素的时间是常数, 亦即O(1).

3). 向列表末尾添加元素时, 如果列表需要申请新的内存, 她会申请比实际需要更大的内存, 以防止每次添加元素都需要申请空间

4). 向列表的中间插入元素时, 需要的时间适合该位置之后的元素的个数相关的, 亦即O(n). 所以在列表的末尾添加元素是很快的, 而在开头添加元素是很慢的.

5). 在列表中删除某个元素的时间和在该位置插入某个元素的时间是一样的, 亦即在末尾删除元素很快, 而在开头删除元素很慢

6). 对列表逆转的时间是O(n).

7). 对列表排序的最长时间为O(n log n), 一般情况下会比这个快很多.

参考文献:

[1]. Python Document: 5. Data Structure.

[2]. An Introduction to Python List

posted on 2013-11-03 11:04 潘的博客阅读(936) 评论(0) 收藏举报

刷新页面返回顶部