第三章 - 字典和集合
字典和集合
dict类型是Python语言的基石,跟它有关的内置函数都在__builtins__.dict模块中。

class dict(object): """ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) """ def clear(self): # real signature unknown; restored from __doc__ """ D.clear() -> None. Remove all items from D. """ pass def copy(self): # real signature unknown; restored from __doc__ """ D.copy() -> a shallow copy of D """ pass @staticmethod # known case def fromkeys(*args, **kwargs): # real signature unknown """ Returns a new dict with keys from iterable and values equal to value. """ pass def get(self, k, d=None): # real signature unknown; restored from __doc__ """ D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. """ pass def items(self): # real signature unknown; restored from __doc__ """ D.items() -> a set-like object providing a view on D's items """ pass def keys(self): # real signature unknown; restored from __doc__ """ D.keys() -> a set-like object providing a view on D's keys """ pass def pop(self, k, d=None): # real signature unknown; restored from __doc__ """ D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised """ pass def popitem(self): # real signature unknown; restored from __doc__ """ D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty. """ pass def setdefault(self, k, d=None): # real signature unknown; restored from __doc__ """ D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D """ pass def update(self, E=None, **F): # known special case of dict.update """ D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] """ pass def values(self): # real signature unknown; restored from __doc__ """ D.values() -> an object providing a view on D's values """ pass def __contains__(self, *args, **kwargs): # real signature unknown """ True if D has a key k, else False. """ pass def __delitem__(self, *args, **kwargs): # real signature unknown """ Delete self[key]. """ pass def __eq__(self, *args, **kwargs): # real signature unknown """ Return self==value. """ pass def __getattribute__(self, *args, **kwargs): # real signature unknown """ Return getattr(self, name). """ pass def __getitem__(self, y): # real signature unknown; restored from __doc__ """ x.__getitem__(y) <==> x[y] """ pass def __ge__(self, *args, **kwargs): # real signature unknown """ Return self>=value. """ pass def __gt__(self, *args, **kwargs): # real signature unknown """ Return self>value. """ pass def __init__(self, seq=None, **kwargs): # known special case of dict.__init__ """ dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) # (copied from class doc) """ pass def __iter__(self, *args, **kwargs): # real signature unknown """ Implement iter(self). """ pass def __len__(self, *args, **kwargs): # real signature unknown """ Return len(self). """ pass def __le__(self, *args, **kwargs): # real signature unknown """ Return self<=value. """ pass def __lt__(self, *args, **kwargs): # real signature unknown """ Return self<value. """ pass @staticmethod # known case of __new__ def __new__(*args, **kwargs): # real signature unknown """ Create and return a new object. See help(type) for accurate signature. """ pass def __ne__(self, *args, **kwargs): # real signature unknown """ Return self!=value. """ pass def __repr__(self, *args, **kwargs): # real signature unknown """ Return repr(self). """ pass def __setitem__(self, *args, **kwargs): # real signature unknown """ Set self[key] to value. """ pass def __sizeof__(self): # real signature unknown; restored from __doc__ """ D.__sizeof__() -> size of D in memory, in bytes """ pass __hash__ = None
正是因为字典至关重要, Python对它的实现做了高度优化,而 散列表 则是字典类型性能出众的根本原因, set的实现也是依赖 散列表。
Python中list对象的存储结构采用的是线性表,因此其查询复杂度为O(n), 而dict对象的存储结构采用的是散列表(hash表),其在最优情况下查询复杂度为O(1)。 因此有时可以替换list优化代码,并实现类似算法。
Python的映射类型就是: dict key=value
frozenset 不可变集合类型。
1. 列表推导式
2. 字典推导式
3. 集合推导式
l = [x for x in range(10)] case = {'a': 10, 'b': 34} d = {b: a for a, b in case.items()} # 字典推导式,快速更换key和value s = {x for x in range(10)} # 集合推导式 print(l) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] print(d) >>> {34: 'b', 10: 'a'} print(s) >>> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b = ((1, 11), (2, 22), (3, 33))
bb = {a:b for a,b in bbb}
print(bb)
>>>
{1: 11, 2: 22, 3: 33}
3.1 泛映射类型
collections内置模块中有Mapping, MutableMapping两个抽象基类, 他们的作用是为dict和其他类似的类型定义形式接口。然而非抽象基类一般不会直接继承这些抽象基类,他们会直接对dict或是collections.User.Dict进行扩展。这些抽象基类的主要作用是作为形式化的文档。
import collections my_dict = {} print(isinstance(my_dict, collections.Mapping)) print(isinstance(my_dict, collections.MutableMapping)) >>> True True
标准库里的所有映射类型都是利用dict来实现的, 因为他们有个共同限制,即只有 可散列 的数据类型才能作用这些映射里的键(值不需要可散列)。
什么是可散列的数据类型?
如果一个对象是可散列的,那么在这个对象的生命周期中,它的散列值是不变的,而且这个对象需要实现 __hash__()方法。
原子不可变数据类型(str, bytes, int)都是可散列类型。frozenset也是可散列的,因为根据其定义,frozenset里只能容纳可散列类型。 元组的话,只有当一个元组包含的所有元素都是可散列类型的情况下,它才是可散列的。
一般来讲用户自定义的类型的对象都是可散列的,散列值就是他们的id()函数返回的值。
多种创建字典的方式:
a = dict(one=1, two=2, three=3) b = dict(((1, 11), (2, 22), (3, 33))) c = dict(zip([1,2,3], [4,5,6])) print(a) print(b) print(c) >>> {'one': 1, 'two': 2, 'three': 3} {1: 11, 2: 22, 3: 33} {1: 4, 2: 5, 3: 6}
用setdefault处理找不到的键
self.registered_admins.setdefault(app_label, {}).update({model._meta.model_name: admin_class})
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
counts = {}
for kw in strings:
counts[kw] = counts.setdefault(kw, 0) + 1
print(counts)
>>>
{'puppy': 5, 'kitten': 2, 'weasel': 1
my_dict.setdefault(key, []).append(new_value)
相当于
if key not in my_dict:
my_dict[key] = []
my_dict[key].append(new_value)
3.4 映射的弹性键查询
有时候为了方便,就算某个键在映射(dict)里不存在,我们也希望在通过这个键读取值的时候能够得到一个默认值。 有两种方法: defaultdict 或者 自定义dict子类,在子类中实现__missing__方法。
示例,统计每个单词出现的频率:
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy') counts = {} for kw in strings: counts[kw] += 1 print(counts) >>> KeyError: 'puppy'
import collections strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy') counts = collections.defaultdict(int) # 申明defaultdict为int类型 for kw in strings: counts[kw] += 1 print(counts) >>> defaultdict(<class 'int'>, {'puppy': 5, 'weasel': 1, 'kitten': 2})
import collections # 使用collections.Counter计算更简单
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
print(collections.Counter(strings))
defaultdict类是如何实现的
通过上面的内容,想必大家已经了解了defaultdict类的用法,那么在defaultdict类中又是如何来实现默认值的功能呢?这其中的关键是使用了看__missing__()
这个方法:
def __missing__(self, key): # real signature unknown; restored from __doc__ """ __missing__(key) # Called by __getitem__ for missing key; pseudo-code: if self.default_factory is None: raise KeyError((key,)) self[key] = value = self.default_factory() return value """ pass
通过查看__getitem__()
方法访问一个不存在的键时会调用__missing__()
方法获取默认值,并将该键添加到字典中去。 __missing__()
方法只会被__getitem__()
调用。
3.7 不可变映射类型
从python3.3开始,types模块中引入了一个封装类名叫MappingProxyType. 如果给这个类一个映射,它会返回一个只读的映射视图。虽然是个只读视图,但是它是动态的。这意味着如果对原映射做出了改动,我们通过这个视图可以观察到,但是无法通过这个视图对原映射做出修改。
from types import MappingProxyType a = {1: "AA"} a_proxy = MappingProxyType(a) print(a_proxy) >>> {1: 'AA'} a_proxy[1] = "BB" TypeError: 'mappingproxy' object does not support item assignment
3.8 集合论
集合常用于去重 和 关系比较。
集合中的元素必须是可散列的,set类型本身是不可散列的,但是 frozenset可以。
如果是空集,那么必须写成 set() 形式,否则{} 会被当成字典。
3.9 dict的实现及其导致的结果
1、键必须是可散列的
2、字典在内存上开销巨大
由于字典使用了散列表,而且散列表又必须是稀疏的(散列表其实是一个稀疏数组,总是有空白元素的数组称为稀疏数组),这导致它在空间上的效率低下,如果你需要存放数量巨大的记录,那么放在由元组或是有名元组构成的列表中会是比较好的选择。
3、键查询很快
dict的实现就是典型的空间换时间。