列表字典等基本类型及扩展类型

元组与不可变对象的特点

对元组 t 来说，t[:] 不创建副本，而是返回同一个对象的引用，同样 tuple(t) 也是返回 t 的引用。

str, bytes, frozenset 也有类似的行为。其中 frozenset 不能使用 [:] 创建实例，但其 copy() 方法具有相同的效果。它们都是返回相同对象的引用。

另外，字符串字面量可能会创建共享的对象。共享字符串字面量是一个优化措施（驻留），一些小的整数上也会有类似优化，防止重复创建经常用到的数字，这些是 CPython 的优化措施导致的。

这些优化在不同的解释器，甚至相同解释器的不同版本之间可能不同，只在不可变的对象上有此行为，对使用者而言不会有太大影响。

t1 = (1, 2, 3)
t2 = tuple(t1)
t3 = t1
print(t1 is t2 is t3) # True

l1 = [1]
l2 = list(l1)
l3 = l1
print(l1 is l2 is l3) # False

t1 = (1, 2)
t2 = (1, 2)
print(t1 is t2) # True

s1 = "ABC"
s2 = "ABC"
print(s1 is s2) # True

列表排序

1.list.sort() : 对列表排序

2.sorted(l) : 对序列进行排序, 并返回一个新的列表

3.bisect.bisect(a, x, lo=0, hi=None) : 在列表 a(必须满足已排序) 中搜索 x 的位置, 该位置满足的条件是, 把 x 插入到这个列表 a 之后, a还能保持排序

bisect.bisect_left 有相同的作用, 但插入后, 如果是重复元素, 新元素在原来的元素位置前

fruits = ["Grape", "Raspberry", "Apple", "Banana"]
print(fruits)
fruits.sort()
print(fruits)

import bisect
print(bisect.bisect(fruits, "Bd"))
print(bisect.bisect(fruits, "Apple"))
print(bisect.bisect_left(fruits, "Apple"))


['Grape', 'Raspberry', 'Apple', 'Banana']
['Apple', 'Banana', 'Grape', 'Raspberry']
2
1
0

4.bisect.insort(a, x, lo=None, hi=None) 插入新元素后, 可保持列表仍为有序状态

同样,它也有一个插入在前的同族方法 bisect.insort_left

import bisect
import random

SIZE = 7

random.seed(1234)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE * 2)
    bisect.insort(my_list, new_item)
    print("%2d ->" % new_item, my_list)

array

如果需要一个只包含数字的列表, 那么 array.array 比 list 更高效. 它支持所有跟可变序列有关的操作, 如 pop, insert, extend 等, 另外还支持从文件读取与写入的更快的方法 frombytes 和 tofile.

从 Python 3.4 开始, array 不再支持 sort 函数, 排序需要使用 sorted 重新创建一个对象.

floats = array.array("d", (random.random() for i in range (3)))
print(floats)
print(floats.typecode)
print(floats.tolist())
print(floats.itemsize) # 元素占用的字节数

# 排序
floats = array.array(floats.typecode, sorted(floats))
print(floats)

所有数值类型的字符代码表:

Type code	C Type	Python Type	Minimum size in bytes
`'c'`	char	character	1
`'b'`	signed char	int	1
`'B'`	unsigned char	int	1
`'u'`	Py_UNICODE	Unicode character	2 (see note)
`'h'`	signed short	int	2
`'H'`	unsigned short	int	2
`'i'`	signed int	int	2
`'I'`	unsigned int	long	2
`'l'`	signed long	int	4
`'L'`	unsigned long	long	4
`'f'`	float	float	4
`'d'`	double	float	8

字典

基本创建方法

d1 = {"one": 1, "two": 2, "three": 3}
d2 = dict(one=1, two=2, three=3)
d3 = dict(zip(["one", "two", "three"], [1, 2, 3]))
d4 = dict({"one": 1, "two": 2, "three": 3})
d5 = dict([("one", 1), ("two", 2), ("three", 3), ])
d6 = dict([["one", 1], ["two", 2], ["three", 3], ])
print(d1 == d2 == d3 == d4 == d5 == d6)

DIAL_CODES = [
    (86, "China"),
    (91, "India"),
    (1, "United States"),
    (62, "Indonesia"),
    (55, "Brazil"),
    (92, "Pakistan"),
    (234, "Nigeria"),
    (7, "Russia"),
    (81, "Japan"),
]
d7 = {country : code for code, country in DIAL_CODES}
d8 = {country.upper() : code for country, code in d7.items() if code < 66}
print(d7)
print(d8)

True
{'China': 86, 'India': 91, 'United States': 1, 'Indonesia': 62, 'Brazil': 55, 'Pakistan': 92, 'Nigeria': 234, 'Russia': 7, 'Japan': 81}
{'UNITED STATES': 1, 'INDONESIA': 62, 'BRAZIL': 55, 'RUSSIA': 7}

setdefault 的运用, 统计单词出现的位置

RE_WORD = re.compile(r"\w+")
index_dict = {}
with open(sys.argv[0], encoding="utf-8", mode="r") as fp:
    for line_no, line in enumerate(fp, 1):
        for match in RE_WORD.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            # occurrences = index_dict.get(word, [])
            # occurrences.append(location)
            # index_dict[word] = occurrences
            index_dict.setdefault(word, []).append(location)

for word in sorted(index_dict, key=str.upper):
    print(word, index_dict[word])

defaultdict 类, 处理找不到键时自动创建一个默认对象(需要在 defaultdict 对象创建时指定一个创建对象的方法)

RE_WORD = re.compile(r"\w+")
index_dict = collections.defaultdict(list)
with open(sys.argv[0], encoding="utf-8", mode="r") as fp:
    for line_no, line in enumerate(fp, 1):
        for match in RE_WORD.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            index_dict[word].append(location)

for word in sorted(index_dict, key=str.upper):
    print(word, index_dict[word])

当使用 d[k], 键 k 在 d 中不存时, __getitem__ 会触发调用 __missing__ 方法, 而 dict 本身没有实现, 会抛出一个 KeyError 异常.

如果想在 d[k] 找不到键值时, 实现更友好的行为, 可以子类化 dict 类, 实现 __missing__. 以下是一个键为 str 类型, 但可以通过 int 作为 key 使用的例子.

class StrKeyDict(dict):
    def __missing__(self, key):
        if isinstance(key, str): # 找不到的键值已经是一个字符串, 说明真的是没有这样的键, 抛出一个异常
            raise KeyError(key)
        return self[str(key)] # 转化为 str 再次查找

    def get(self, key, default=None):
        try:
            return self[key] # 如果查找不到, 触发调用 missing, 尝试转化 str 后查找
        except KeyError:
            return default # 还是找不到, 则返回默认值

    def __contains__(self, key):
        return key in self.keys() or str(key) in self.keys() # 使用原本的值查找或转化为 str 后查找


d1 = StrKeyDict({"1": "one", "2": "two"})

print(d1.get("1"))
print(d1.get(2))
print(d1.get(4, "N/A"))

print(1 in d1)
print(4 in d1)

print(d1["1"])
print(d1[2])
# print(d1[4]) # 触发 KeyError 异常
# print(d1["4"]) # 触发 KeyError 异常

其它字典类型

1.collections.OrderedDict: 这个类在添加键的时候会保持顺序, 因此键的迭代次序总是一致的, 当使用 popitem 方法时, 总是删除并返回最后一个元素(参数 last=False时, 则正好相反).

2.collections.ChainMap: 这个类可以存数个不同的映射对象, 然后在进行键查找时, 这些对象被逐个查找, 查到键被找到为止.

import collections
import builtins

pylookup = collections.ChainMap(locals(), globals(), vars(builtins))
# print(vars(builtins))
# print(locals())
print(pylookup["__doc__"]) # 返回第一个找到的值, 即 locals 中的 doc

3.collections.UserDict: 这个类专门用来给用户扩展字典类型, 因为内置类型 dict 在实现时常常会走一些捷径, 导致子类化时不得不重写某些方法.

如前所示, 在使用 dict 作为基类时, 必须重写 get 方法, 以保持和 __getitem__ 一致的行为, 而使用 UserDict 时, 则不需要这样做.

import collections

class StrKeyDict(collections.UserDict):
    def __missing__(self, key):
        if isinstance(key, str):
            raise KeyError(key)
        return self[str(key)]

    def __contains__(self, key):
        return str(key) in self.data

    def __setitem__(self, key, value):
        self.data[str(key)] = value



d1 = StrKeyDict({"1": "one", "2": "two"})

print(d1.get("1"))
print(d1.get(2))
print(d1.get(4, "N/A"))

print(1 in d1)
print(4 in d1)

print(d1["1"])
print(d1[2])
# print(d1[4]) # 触发 KeyError 异常
# print(d1["4"]) # 触发 KeyError 异常

4. types.MappingProxyType : 只读映射, 它使用一个字典进行构造, 返回一个只读的映射视图.

import types
rd = types.MappingProxyType(d1)
print(rd["1"])
rd["a"] = "A" # TypeError: 'mappingproxy' object does not support item assignment

posted @ 2020-03-22 09:13 阿Hai 阅读(343) 评论(0) 收藏举报

刷新页面返回顶部

阿Hai

列表字典等基本类型及扩展类型

元组与不可变对象的特点

列表排序

array

字典

公告