Python基础二数据类型和文件操作

摘自：http://www.cnblogs.com/yuanchenqi/articles/5782764.html

深浅拷贝视频链接: https://pan.baidu.com/s/1hr5eOU4 密码: 3tmn

本节内容

列表、元组操作
字符串操作
字典操作
集合操作
文件操作
字符编码与转码

1. 列表、元组操作

列表是我们最以后最常用的数据类型之一，通过列表可以对数据实现最方便的存储、修改等操作

定义列表

names = ['Alex',"Tenglan",'Eric']

通过下标访问列表中的元素，下标从0开始计数

>>> names[0]
'Alex'
>>> names[2]
'Eric'
>>> names[-1]
'Eric'
>>> names[-2] #还可以倒着取
'Tenglan'

切片:取多个元素　　

>>> names = ["Alex","Tenglan","Eric","Rain","Tom","Amy"]
>>> names[1:4]  #取下标1至下标4之间的数字，包括1，不包括4
['Tenglan', 'Eric', 'Rain']
>>> names[1:-1] #取下标1至-1的值，不包括-1
['Tenglan', 'Eric', 'Rain', 'Tom']
>>> names[0:3] 
['Alex', 'Tenglan', 'Eric']
>>> names[:3] #如果是从头开始取，0可以忽略，跟上句效果一样
['Alex', 'Tenglan', 'Eric']
>>> names[3:] #如果想取最后一个，必须不能写-1，只能这么写
['Rain', 'Tom', 'Amy'] 
>>> names[3:-1] #这样-1就不会被包含了
['Rain', 'Tom']
>>> names[0::2] #后面的2是代表，每隔一个元素，就取一个
['Alex', 'Eric', 'Tom'] 
>>> names[::2] #和上句效果一样
['Alex', 'Eric', 'Tom']

View Code

追加

>>> names
['Alex', 'Tenglan', 'Eric', 'Rain', 'Tom', 'Amy']
>>> names.append("我是新来的")
>>> names
['Alex', 'Tenglan', 'Eric', 'Rain', 'Tom', 'Amy', '我是新来的']

View Code

插入

>>> names
['Alex', 'Tenglan', 'Eric', 'Rain', 'Tom', 'Amy', '我是新来的']
>>> names.insert(2,"强行从Eric前面插入")
>>> names
['Alex', 'Tenglan', '强行从Eric前面插入', 'Eric', 'Rain', 'Tom', 'Amy', '我是新来的']

>>> names.insert(5,"从eric后面插入试试新姿势")
>>> names
['Alex', 'Tenglan', '强行从Eric前面插入', 'Eric', 'Rain', '从eric后面插入试试新姿势', 'Tom', 'Amy', '我是新来的']

View Code

修改

>>> names
['Alex', 'Tenglan', '强行从Eric前面插入', 'Eric', 'Rain', '从eric后面插入试试新姿势', 'Tom', 'Amy', '我是新来的']
>>> names[2] = "该换人了"
>>> names
['Alex', 'Tenglan', '该换人了', 'Eric', 'Rain', '从eric后面插入试试新姿势', 'Tom', 'Amy', '我是新来的']

View Code

删除

>>> del names[2] 
>>> names
['Alex', 'Tenglan', 'Eric', 'Rain', '从eric后面插入试试新姿势', 'Tom', 'Amy', '我是新来的']
>>> del names[4]
>>> names
['Alex', 'Tenglan', 'Eric', 'Rain', 'Tom', 'Amy', '我是新来的']
>>> 
>>> names.remove("Eric") #删除指定元素
>>> names
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy', '我是新来的']
>>> names.pop() #删除列表最后一个值 
'我是新来的'
>>> names
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy']

View Code

扩展

>>> names
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy']
>>> b = [1,2,3]
>>> names.extend(b)
>>> names
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy', 1, 2, 3]

View Code

拷贝

>>> names
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy', 1, 2, 3]

>>> name_copy = names.copy()
>>> name_copy
['Alex', 'Tenglan', 'Rain', 'Tom', 'Amy', 1, 2, 3]

View Code

copy真的这么简单么？那我还讲个屁。。。

统计

>>> names
['Alex', 'Tenglan', 'Amy', 'Tom', 'Amy', 1, 2, 3]
>>> names.count("Amy")
2

View Code

排序&翻转

>>> names
['Alex', 'Tenglan', 'Amy', 'Tom', 'Amy', 1, 2, 3]
>>> names.sort() #排序
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()   #3.0里不同数据类型不能放在一起排序了，擦
>>> names[-3] = '1'
>>> names[-2] = '2'
>>> names[-1] = '3'
>>> names
['Alex', 'Amy', 'Amy', 'Tenglan', 'Tom', '1', '2', '3']
>>> names.sort()
>>> names
['1', '2', '3', 'Alex', 'Amy', 'Amy', 'Tenglan', 'Tom']

>>> names.reverse() #反转
>>> names
['Tom', 'Tenglan', 'Amy', 'Amy', 'Alex', '3', '2', '1']

View Code

获取下标

>>> names
['Tom', 'Tenglan', 'Amy', 'Amy', 'Alex', '3', '2', '1']
>>> names.index("Amy")
2 #只返回找到的第一个下标

View Code

列表生成式

>>> a=[x for x in range(10)]
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

深浅拷贝

现在，大家先不要理会什么是深浅拷贝，听我说，对于一个列表，我想复制一份怎么办呢？

肯定会有同学说，重新赋值呗：

names_class1=['张三','李四','王五','赵六']names_class1_copy=['张三','李四','王五','赵六']

这是两块独立的内存空间

这也没问题，还是那句话，如果列表内容做够大，你真的可以要每一个元素都重新写一遍吗？当然不啦，所以列表里为我们内置了copy方法：

names_class1=['张三','李四','王五','赵六',[1,2,3]]
names_class1_copy=names_class1.copy()
 
names_class1[0]='zhangsan'
print(names_class1)
print(names_class1_copy)
 
############
names_class1[4][2]=5
print(names_class1)
print(names_class1_copy)
 
#问题来了,为什么names_class1_copy,从这一点我们可以断定,这两个变量并不是完全独立的,那他们的关系是什么呢?为什么有的改变,有的不改变呢?

里就涉及到我们要讲的深浅拷贝了：

#不可变数据类型:数字,字符串,元组         可变类型:列表,字典
 
# l=[2,2,3]
# print(id(l))
# l[0]=5
# print(id(l))   # 当你对可变类型进行修改时,比如这个列表对象l,它的内存地址不会变化,注意是这个列表对象l,不是它里面的元素
#                # this is the most important
#
# s='alex'
# print(id(s))   #像字符串,列表,数字这些不可变数据类型,,是不能修改的,比如我想要一个'Alex'的字符串,只能重新创建一个'Alex'的对象,然后让指针只想这个新对象
#
# s[0]='e'       #报错
# print(id(s))
 
#重点:浅拷贝
a=[[1,2],3,4]
b=a[:]#b=a.copy()
 
print(a,b)
print(id(a),id(b))
print('*************')
print('a[0]:',id(a[0]),'b[0]:',id(b[0]))
print('a[0][0]:',id(a[0][0]),'b[0][0]:',id(b[0][0]))
print('a[0][1]:',id(a[0][1]),'b[0][1]:',id(b[0][1]))
print('a[1]:',id(a[1]),'b[1]:',id(b[1]))
print('a[2]:',id(a[2]),'b[2]:',id(b[2]))
 
 
print('___________________________________________')
b[0][0]=8
 
print(a,b)
print(id(a),id(b))
print('*************')
print('a[0]:',id(a[0]),'b[0]:',id(b[0]))
print('a[0][0]:',id(a[0][0]),'b[0][0]:',id(b[0][0]))
print('a[0][1]:',id(a[0][1]),'b[0][1]:',id(b[0][1]))
print('a[1]:',id(a[1]),'b[1]:',id(b[1]))
print('a[2]:',id(a[2]),'b[2]:',id(b[2]))<br><br><br>#outcome

# [[1, 2], 3, 4] [[1, 2], 3, 4]
# 4331943624 4331943752
# *************
# a[0]: 4331611144 b[0]: 4331611144
# a[0][0]: 4297375104 b[0][0]: 4297375104
# a[0][1]: 4297375136 b[0][1]: 4297375136
# a[1]: 4297375168 b[1]: 4297375168
# a[2]: 4297375200 b[2]: 4297375200
# ___________________________________________
# [[8, 2], 3, 4] [[8, 2], 3, 4]
# 4331943624 4331943752
# *************
# a[0]: 4331611144 b[0]: 4331611144
# a[0][0]: 4297375328 b[0][0]: 4297375328
# a[0][1]: 4297375136 b[0][1]: 4297375136
# a[1]: 4297375168 b[1]: 4297375168
# a[2]: 4297375200 b[2]: 4297375200

那么怎么解释这样的一个结果呢？

再不懂，俺就没办法啦...

列表补充：

b,*c=[1,2,3,4,5]

元组

元组其实跟列表差不多，也是存一组数，只不是它一旦创建，便不能再修改，所以又叫只读列表

语法

 names = ("alex","jack","eric")

它只有2个方法，一个是count,一个是index，完毕。　　

程序练习

请闭眼写出以下程序。

程序：购物车程序

需求:

启动程序后，让用户输入工资，然后打印商品列表
允许用户根据商品编号购买商品
用户选择商品后，检测余额是否够，够就直接扣款，不够就提醒
可随时退出，退出时，打印已购买商品和余额

2. 字符串操作　

二字符串类型（string）

字符串是以单引号'或双引号"括起来的任意文本，比如'abc'，"123"等等。

请注意，''或""本身只是一种表示方式，不是字符串的一部分，因此，字符串'abc'只有a，b，c这3个字符。如果'本身也是一个字符，那就可以用""括起来，比如"I'm OK"包含的字符是I，'，m，空格，O，K这6个字符

2.1 创建字符串：

var1 = 'Hello World!'

var2 = "Python RAlvin"

对应操作：

# 1   * 重复输出字符串

print('hello'*2)

# 2 [] ,[:] 通过索引获取字符串中字符,这里和列表的切片操作是相同的,具体内容见列表

print('helloworld'[2:])

# 3 in  成员运算符 - 如果字符串中包含给定的字符返回 True

print('el' in 'hello')

# 4 %   格式字符串

print('alex is a good teacher')

print('%s is a good teacher'%'alex')

# 5 +   字符串拼接

a='123'

b='abc'

c='789'

d1=a+b+c

print(d1)

# +效率低,该用join

d2=''.join([a,b,c])

print(d2)

Python内置方法

 1 # string.capitalize()                                  把字符串的第一个字符大写
 2 # string.center(width)                                 返回一个原字符串居中,并使用空格填充至长度 width 的新字符串
 3 # string.count(str, beg=0, end=len(string))            返回 str 在 string 里面出现的次数，如果 beg 或者 end 指定则返回指定范围内 str 出现的次数
 4 # string.decode(encoding='UTF-8', errors='strict')     以 encoding 指定的编码格式解码 string，如果出错默认报一个 ValueError 的 异 常 ， 除 非 errors 指 定 的 是 'ignore' 或 者'replace'
 5 # string.encode(encoding='UTF-8', errors='strict')     以 encoding 指定的编码格式编码 string，如果出错默认报一个ValueError 的异常，除非 errors 指定的是'ignore'或者'replace'
 6 # string.endswith(obj, beg=0, end=len(string))         检查字符串是否以 obj 结束，如果beg 或者 end 指定则检查指定的范围内是否以 obj 结束，如果是，返回 True,否则返回 False.
 7 # string.expandtabs(tabsize=8)                         把字符串 string 中的 tab 符号转为空格，tab 符号默认的空格数是 8。
 8 # string.find(str, beg=0, end=len(string))             检测 str 是否包含在 string 中，如果 beg 和 end 指定范围，则检查是否包含在指定范围内，如果是返回开始的索引值，否则返回-1
 9 # string.index(str, beg=0, end=len(string))            跟find()方法一样，只不过如果str不在 string中会报一个异常.
10 # string.isalnum()                                     如果 string 至少有一个字符并且所有字符都是字母或数字则返回 True,否则返回 False
11 # string.isalpha()                                     如果 string 至少有一个字符并且所有字符都是字母则返回 True,否则返回 False
12 # string.isdecimal()                                   如果 string 只包含十进制数字则返回 True 否则返回 False.
13 # string.isdigit()                                     如果 string 只包含数字则返回 True 否则返回 False.
14 # string.islower()                                     如果 string 中包含至少一个区分大小写的字符，并且所有这些(区分大小写的)字符都是小写，则返回 True，否则返回 False
15 # string.isnumeric()                                   如果 string 中只包含数字字符，则返回 True，否则返回 False
16 # string.isspace()                                     如果 string 中只包含空格，则返回 True，否则返回 False.
17 # string.istitle()                                     如果 string 是标题化的(见 title())则返回 True，否则返回 False
18 # string.isupper()                                     如果 string 中包含至少一个区分大小写的字符，并且所有这些(区分大小写的)字符都是大写，则返回 True，否则返回 False
19 # string.join(seq)                                     以 string 作为分隔符，将 seq 中所有的元素(的字符串表示)合并为一个新的字符串
20 # string.ljust(width)                                  返回一个原字符串左对齐,并使用空格填充至长度 width 的新字符串
21 # string.lower()                                       转换 string 中所有大写字符为小写.
22 # string.lstrip()                                      截掉 string 左边的空格
23 # string.maketrans(intab, outtab])                     maketrans() 方法用于创建字符映射的转换表，对于接受两个参数的最简单的调用方式，第一个参数是字符串，表示需要转换的字符，第二个参数也是字符串表示转换的目标。
24 # max(str)                                             返回字符串 str 中最大的字母。
25 # min(str)                                             返回字符串 str 中最小的字母。
26 # string.partition(str)                                有点像 find()和 split()的结合体,从 str 出现的第一个位置起,把 字 符 串 string 分 成 一 个 3 元 素 的 元 组 (string_pre_str,str,string_post_str),如果 string 中不包含str 则 string_pre_str == string.
27 # string.replace(str1, str2,  num=string.count(str1))  把 string 中的 str1 替换成 str2,如果 num 指定，则替换不超过 num 次.
28 # string.rfind(str, beg=0,end=len(string) )            类似于 find()函数，不过是从右边开始查找.
29 # string.rindex( str, beg=0,end=len(string))           类似于 index()，不过是从右边开始.
30 # string.rjust(width)                                  返回一个原字符串右对齐,并使用空格填充至长度 width 的新字符串
31 # string.rpartition(str)                               类似于 partition()函数,不过是从右边开始查找.
32 # string.rstrip()                                      删除 string 字符串末尾的空格.
33 # string.split(str="", num=string.count(str))          以 str 为分隔符切片 string，如果 num有指定值，则仅分隔 num 个子字符串
34 # string.splitlines(num=string.count('\n'))            按照行分隔，返回一个包含各行作为元素的列表，如果 num 指定则仅切片 num 个行.
35 # string.startswith(obj, beg=0,end=len(string))        检查字符串是否是以 obj 开头，是则返回 True，否则返回 False。如果beg 和 end 指定值，则在指定范围内检查.
36 # string.strip([obj])                                  在 string 上执行 lstrip()和 rstrip()
37 # string.swapcase()                                    翻转 string 中的大小写
38 # string.title()                                       返回"标题化"的 string,就是说所有单词都是以大写开始，其余字母均为小写(见 istitle())
39 # string.translate(str, del="")                        根据 str 给出的表(包含 256 个字符)转换 string 的字符,要过滤掉的字符放到 del 参数中
40 # string.upper()                                       转换 string 中的小写字母为大写

View Code

三字节类型(bytes)

# a=bytes('hello','utf8')

# a=bytes('中国','utf8')

a=bytes('中国','utf8')

b=bytes('hello','gbk')

#

print(a) #b'\xe4\xb8\xad\xe5\x9b\xbd'

print(ord('h')) #其十进制 unicode 值为: 104

print(ord('中'))#其十进制 unicode 值为:20013

# h e l l o

# 104 101 108 108 111 编码后结果:与ASCII表对应

# 中国

# \xd6\xd0 \xb9\xfa gbk编码后的字节结果

#\xe4 \xb8 \xad \xe5 \x9b \xbd utf8编码后的字节结果

# 228 184 173 229 155 189 a[:]切片取

c=a.decode('utf8')

d=b.decode('gbk')

#b=a.decode('gbk') :很明显报错

print(c) #中国

print(d) #hello

注意

对于 ASCII 字符串，因为无论哪种编码对应的结果都是一样的，所以可以直接使用 b'xxxx' 赋值创建 bytes 实例，但对于非 ASCII 编码的字符则不能通过这种方式创建 bytes 实例，需要指明编码方式。

b1=b'123'

print(type(b1))

# b2=b'中国' #报错

# 所以得这样:

b2=bytes('中国','utf8')

print(b2)#b'\xe4\xb8\xad\xe5\x9b\xbd'

3. 字典操作

字典一种key - value 的数据类型，使用就像我们上学用的字典，通过笔划、字母来查对应页的详细内容。

语法:

info = {
    'stu1101': "TengLan Wu",
    'stu1102': "LongZe Luola",
    'stu1103': "XiaoZe Maliya",
}

字典的特性：

dict是无序的
key必须是唯一的,so 天生去重

增加

>>> info["stu1104"] = "苍井空"
>>> info
{'stu1102': 'LongZe Luola', 'stu1104': '苍井空', 'stu1103': 'XiaoZe Maliya', 'stu1101': 'TengLan Wu'}

View Code

修改

>>> info['stu1101'] = "武藤兰"
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1101': '武藤兰'}

View Code

删除

>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1101': '武藤兰'}
>>> info.pop("stu1101") #标准删除姿势
'武藤兰'
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya'}
>>> del info['stu1103'] #换个姿势删除
>>> info
{'stu1102': 'LongZe Luola'}
>>> 
>>> 
>>> 
>>> info = {'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya'}
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya'} #随机删除
>>> info.popitem()
('stu1102', 'LongZe Luola')
>>> info
{'stu1103': 'XiaoZe Maliya'}

View Code

查找

>>> info = {'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya'}
>>> 
>>> "stu1102" in info #标准用法
True
>>> info.get("stu1102")  #获取
'LongZe Luola'
>>> info["stu1102"] #同上，但是看下面
'LongZe Luola'
>>> info["stu1105"]  #如果一个key不存在，就报错，get不会，不存在只返回None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'stu1105'

View Code

多级字典嵌套及操作

av_catalog = {
    "欧美":{
        "www.youporn.com": ["很多免费的,世界最大的","质量一般"],
        "www.pornhub.com": ["很多免费的,也很大","质量比yourporn高点"],
        "letmedothistoyou.com": ["多是自拍,高质量图片很多","资源不多,更新慢"],
        "x-art.com":["质量很高,真的很高","全部收费,屌比请绕过"]
    },
    "日韩":{
        "tokyo-hot":["质量怎样不清楚,个人已经不喜欢日韩范了","听说是收费的"]
    },
    "大陆":{
        "1024":["全部免费,真好,好人一生平安","服务器在国外,慢"]
    }
}

av_catalog["大陆"]["1024"][1] += ",可以用爬虫爬下来"
print(av_catalog["大陆"]["1024"])
#ouput 
['全部免费,真好,好人一生平安', '服务器在国外,慢,可以用爬虫爬下来']

View Code

其它姿势

#values
>>> info.values()
dict_values(['LongZe Luola', 'XiaoZe Maliya'])

#keys
>>> info.keys()
dict_keys(['stu1102', 'stu1103'])


#setdefault
>>> info.setdefault("stu1106","Alex")
'Alex'
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1106': 'Alex'}
>>> info.setdefault("stu1102","龙泽萝拉")
'LongZe Luola'
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1106': 'Alex'}


#update 
>>> info
{'stu1102': 'LongZe Luola', 'stu1103': 'XiaoZe Maliya', 'stu1106': 'Alex'}
>>> b = {1:2,3:4, "stu1102":"龙泽萝拉"}
>>> info.update(b)
>>> info
{'stu1102': '龙泽萝拉', 1: 2, 3: 4, 'stu1103': 'XiaoZe Maliya', 'stu1106': 'Alex'}

#items
info.items()
dict_items([('stu1102', '龙泽萝拉'), (1, 2), (3, 4), ('stu1103', 'XiaoZe Maliya'), ('stu1106', 'Alex')])


#通过一个列表生成默认dict,有个没办法解释的坑，少用吧这个
>>> dict.fromkeys([1,2,3],'testd')
{1: 'testd', 2: 'testd', 3: 'testd'}

View Code

循环dict

#方法1
for key in info:
    print(key,info[key])

#方法2
for k,v in info.items(): #会先把dict转成list,数据里大时莫用
    print(k,v)

程序练习

程序: 三级菜单

要求:

打印省、市、县三级菜单
可返回上一级
可随时退出程序

menu = {
    '北京':{
        '海淀':{
            '五道口':{
                'soho':{},
                '网易':{},
                'google':{}
            },
            '中关村':{
                '爱奇艺':{},
                '汽车之家':{},
                'youku':{},
            },
            '上地':{
                '百度':{},
            },
        },
        '昌平':{
            '沙河':{
                '老男孩':{},
                '北航':{},
            },
            '天通苑':{},
            '回龙观':{},
        },
        '朝阳':{},
        '东城':{},
    },
    '上海':{
        '闵行':{
            "人民广场":{
                '炸鸡店':{}
            }
        },
        '闸北':{
            '火车战':{
                '携程':{}
            }
        },
        '浦东':{},
    },
    '山东':{},
}


exit_flag = False
current_layer = menu

layers = [menu]

while not  exit_flag:
    for k in current_layer:
        print(k)
    choice = input(">>:").strip()
    if choice == "b":
        current_layer = layers[-1]
        #print("change to laster", current_layer)
        layers.pop()
    elif choice not  in current_layer:continue
    else:
        layers.append(current_layer)
        current_layer = current_layer[choice]

三年菜单文艺青年版

八集合(set)

集合(set)：把不同的元素组成一起形成集合，是python基本的数据类型。

集合元素(set elements):组成集合的成员(不可重复)

li=[1,2,'a','b']
s =set(li)
print(s)    # {1, 2, 'a', 'b'}
 
li2=[1,2,1,'a','a']
s=set(li2)
print(s)  #{1, 2, 'a'

　　集合对象是一组无序排列的可哈希的值：集合成员可以做字典的键

li=[[1,2],'a','b']
s =set(li) #TypeError: unhashable type: 'list'
print(s)

集合分类：可变集合、不可变集合

可变集合(set)：可添加和删除元素，非可哈希的，不能用作字典的键，也不能做其他集合的元素

不可变集合(frozenset)：与上面恰恰相反

li=[1,'a','b']
s =set(li)
dic={s:'123'} #TypeError: unhashable type: 'set'

集合的相关操作　　

1、创建集合

由于集合没有自己的语法格式，只能通过集合的工厂方法set()和frozenset()创建

s1 = set('alvin')
 
s2= frozenset('yuan')
 
print(s1,type(s1))  #{'l', 'v', 'i', 'a', 'n'} <class 'set'>
print(s2,type(s2))  #frozenset({'n', 'y', 'a', 'u'}) <class 'frozenset'>

2、访问集合

由于集合本身是无序的，所以不能为集合创建索引或切片操作，只能循环遍历或使用in、not in来访问或判断集合元素

s1 = set('alvin')
print('a' in s1)
print('b' in s1)
#s1[1]  #TypeError: 'set' object does not support indexing
 
for i in s1:
    print(i)
#    
# True
# False
# v
# n
# l
# i
# a

3、更新集合

可使用以下内建方法来更新：

s.add()
s.update()
s.remove()

注意只有可变集合才能更新：

# s1 = frozenset('alvin')
# s1.add(0)  #AttributeError: 'frozenset' object has no attribute 'add'
 
s2=set('alvin')
s2.add('mm')
print(s2)  #{'mm', 'l', 'n', 'a', 'i', 'v'}
 
s2.update('HO')#添加多个元素
print(s2)  #{'mm', 'l', 'n', 'a', 'i', 'H', 'O', 'v'}
 
s2.remove('l')
print(s2)  #{'mm', 'n', 'a', 'i', 'H', 'O', 'v'}

del：删除集合本身　

四、集合类型操作符　

1 in ,not in
2 集合等价与不等价(==, !=)
3 子集、超集

s=set('alvinyuan')
s1=set('alvin')
print('v' in s)
print(s1<s)

4 联合(|)

联合(union)操作与集合的or操作其实等价的，联合符号有个等价的方法，union()。

s1=set('alvin')
s2=set('yuan')
s3=s1|s2
print(s3)  #{'a', 'l', 'i', 'n', 'y', 'v', 'u'}
print(s1.union(s2)) #{'a', 'l', 'i', 'n', 'y', 'v', 'u'}

5、交集(&)

与集合and等价，交集符号的等价方法是intersection()

s1=set('alvin')
s2=set('yuan')
s3=s1&s2
print(s3)  #{'n', 'a'}
 
print(s1.intersection(s2)) #{'n', 'a'}

　　6、查集(-)
等价方法是difference()

s1=set('alvin')
s2=set('yuan')
s3=s1-s2
print(s3)  #{'v', 'i', 'l'}
 
print(s1.difference(s2)) #{'v', 'i', 'l'}

7、对称差集(^)

对称差分是集合的XOR(‘异或’)，取得的元素属于s1,s2但不同时属于s1和s2.其等价方法symmetric_difference()

s1=set('alvin')
s2=set('yuan')
s3=s1^s2
print(s3)  #{'l', 'v', 'y', 'u', 'i'}
 
print(s1.symmetric_difference(s2)) #{'l', 'v', 'y', 'u', 'i'}

　　应用

'''最简单的去重方式'''
lis = [1,2,3,4,1,2,3,4]
print list(set(lis))    #[1, 2, 3, 4]

5. 文件操作

对文件操作流程

打开文件，得到文件句柄并赋值给一个变量
通过句柄对文件进行操作
关闭文件

现有文件如下：

昨夜寒蛩不住鸣。

惊回千里梦，已三更。

起来独自绕阶行。

人悄悄，帘外月胧明。

白首为功名，旧山松竹老，阻归程。

欲将心事付瑶琴。

知音少，弦断有谁听。

f = open('小重山') #打开文件
data=f.read()#获取文件内容
f.close() #关闭文件

注意 if in the win，hello文件是utf8保存的，打开文件时open函数是通过操作系统打开的文件，而win操作系统

默认的是gbk编码，所以直接打开会乱码，需要f=open('hello',encoding='utf8')，hello文件如果是gbk保存的，则直接打开即可。

9.2 文件打开模式　　

======== ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================

先介绍三种最基本的模式：

# f = open('小重山2','w') #打开文件

# f = open('小重山2','a') #打开文件

# f.write('莫等闲1\n')

# f.write('白了少年头2\n')

# f.write('空悲切!3')

9.3 文件具体操作

def read(self, size=-1): # known case of _io.FileIO.read
        """
        注意，不一定能全读回来
        Read at most size bytes, returned as bytes.

        Only makes one system call, so less data may be returned than requested.
        In non-blocking mode, returns None if no data is available.
        Return an empty bytes object at EOF.
        """
        return ""

def readline(self, *args, **kwargs):
        pass

def readlines(self, *args, **kwargs):
        pass


def tell(self, *args, **kwargs): # real signature unknown
        """
        Current file position.

        Can raise OSError for non seekable files.
        """
        pass

def seek(self, *args, **kwargs): # real signature unknown
        """
        Move to new file position and return the file position.

        Argument offset is a byte count.  Optional argument whence defaults to
        SEEK_SET or 0 (offset from start of file, offset should be >= 0); other values
        are SEEK_CUR or 1 (move relative to current position, positive or negative),
        and SEEK_END or 2 (move relative to end of file, usually negative, although
        many platforms allow seeking beyond the end of a file).

        Note that not all file objects are seekable.
        """
        pass

def write(self, *args, **kwargs): # real signature unknown
        """
        Write bytes b to file, return number written.

        Only makes one system call, so not all of the data may be written.
        The number of bytes actually written is returned.  In non-blocking mode,
        returns None if the write would block.
        """
        pass

def flush(self, *args, **kwargs):
        pass


def truncate(self, *args, **kwargs): # real signature unknown
        """
        Truncate the file to at most size bytes and return the truncated size.

        Size defaults to the current file position, as returned by tell().
        The current file position is changed to the value of size.
        """
        pass


def close(self): # real signature unknown; restored from __doc__
            """
            Close the file.

            A closed file cannot be used for further I/O operations.  close() may be
            called more than once without error.
            """
            pass
##############################################################less usefull
    def fileno(self, *args, **kwargs): # real signature unknown
            """ Return the underlying file descriptor (an integer). """
            pass

    def isatty(self, *args, **kwargs): # real signature unknown
        """ True if the file is connected to a TTY device. """
        pass

    def readable(self, *args, **kwargs): # real signature unknown
        """ True if file was opened in a read mode. """
        pass

    def readall(self, *args, **kwargs): # real signature unknown
        """
        Read all data from the file, returned as bytes.

        In non-blocking mode, returns as much as is immediately available,
        or None if no data is available.  Return an empty bytes object at EOF.
        """
        pass

    def seekable(self, *args, **kwargs): # real signature unknown
        """ True if file supports random-access. """
        pass


    def writable(self, *args, **kwargs): # real signature unknown
        """ True if file was opened in a write mode. """
        pass

操作方法介绍

操作方法介绍

f = open('小重山') #打开文件
# data1=f.read()#获取文件内容
# data2=f.read()#获取文件内容
#
# print(data1)
# print('...',data2)
# data=f.read(5)#获取文件内容
 
# data=f.readline()
# data=f.readline()
# print(f.__iter__().__next__())
# for i in range(5):
#     print(f.readline())
 
# data=f.readlines()
 
# for line in f.readlines():
#     print(line)
 
 
# 问题来了:打印所有行,另外第3行后面加上:'end 3'
# for index,line in enumerate(f.readlines()):
#     if index==2:
#         line=''.join([line.strip(),'end 3'])
#     print(line.strip())
 
#切记:以后我们一定都用下面这种
# count=0
# for line in f:
#     if count==3:
#         line=''.join([line.strip(),'end 3'])
#     print(line.strip())
#     count+=1
 
# print(f.tell())
# print(f.readline())
# print(f.tell())#tell对于英文字符就是占一个,中文字符占三个,区分与read()的不同.
# print(f.read(5))#一个中文占三个字符
# print(f.tell())
# f.seek(0)
# print(f.read(6))#read后不管是中文字符还是英文字符,都统一算一个单位,read(6),此刻就读了6个中文字符
 
#terminal上操作:
f = open('小重山2','w')
# f.write('hello \n')
# f.flush()
# f.write('world')
 
# 应用:进度条
# import time,sys
# for i in range(30):
#     sys.stdout.write("*")
#     # sys.stdout.flush()
#     time.sleep(0.1)
 
 
# f = open('小重山2','w')
# f.truncate()#全部截断
# f.truncate(5)#全部截断
 
 
# print(f.isatty())
# print(f.seekable())
# print(f.readable())
 
f.close() #关闭文件

接下来我们继续扩展文件模式

# f = open('小重山2','w') #打开文件
# f = open('小重山2','a') #打开文件
# f.write('莫等闲1\n')
# f.write('白了少年头2\n')
# f.write('空悲切!3')
 
 
# f.close()
 
#r+,w+模式
# f = open('小重山2','r+') #以读写模式打开文件
# print(f.read(5))#可读
# f.write('hello')
# print('------')
# print(f.read())
 
 
# f = open('小重山2','w+') #以写读模式打开文件
# print(f.read(5))#什么都没有,因为先格式化了文本
# f.write('hello alex')
# print(f.read())#还是read不到
# f.seek(0)
# print(f.read())
 
#w+与a+的区别在于是否在开始覆盖整个文件
 
 
# ok,重点来了,我要给文本第三行后面加一行内容:'hello 岳飞!'
# 有同学说,前面不是做过修改了吗? 大哥,刚才是修改内容后print,现在是对文件进行修改!!!
# f = open('小重山2','r+') #以写读模式打开文件
# f.readline()
# f.readline()
# f.readline()
# print(f.tell())
# f.write('hello 岳飞')
# f.close()
# 和想的不一样,不管事!那涉及到文件修改怎么办呢?
 
# f_read = open('小重山','r') #以写读模式打开文件
# f_write = open('小重山_back','w') #以写读模式打开文件
 
# count=0
# for line in f_read:
    # if count==3:
    #     f_write.write('hello,岳飞\n')
    #
    # else:
    #     f_write.write(line)
 
 
    # another way:
    # if count==3:
    #
    #     line='hello,岳飞2\n'
    # f_write.write(line)
    # count+=1
 
 
# #二进制模式
# f = open('小重山2','wb') #以二进制的形式读文件
# # f = open('小重山2','wb') #以二进制的形式写文件
# f.write('hello alvin!'.encode())#b'hello alvin!'就是一个二进制格式的数据,只是为了观看,没有显示成010101的形式

注意1: 无论是py2还是py3，在r+模式下都可以等量字节替换，但没有任何意义的！　

注意2：有同学在这里会用readlines得到内容列表，再通过索引对相应内容进行修改，最后将列表重新写会该文件。

这种思路有一个很大的问题，数据若很大，你的内存会受不了的，而我们的方式则可以通过迭代器来优化这个过程。　

补充：rb模式以及seek

在py2中：

f = open('test','r',) #以写读模式打开文件
 
f.read(3)
 
# f.seek(3)
# print f.read(3) # 夜
 
# f.seek(3,1)
# print f.read(3) # 寒
 
# f.seek(-4,2)
# print f.read(3) # 鸣

在py3中：

# test: 
昨夜寒蛩不住鸣.

f = open('test','rb',) #以写读模式打开文件

f.read(3)

# f.seek(3)
# print(f.read(3)) # b'\xe5\xa4\x9c'

# f.seek(3,1)
# print(f.read(3)) # b'\xe5\xaf\x92'

# f.seek(-4,2)
# print(f.read(3))   # b'\xe9\xb8\xa3'

#总结: 在py3中,如果你想要字符数据,即用于观看的,则用r模式,这样我f.read到的数据是一个经过decode的
#     unicode数据; 但是如果这个数据我并不需要看,而只是用于传输,比如文件上传,那么我并不需要decode
#     直接传送bytes就好了,所以这个时候用rb模式.

#     在py3中,有一条严格的线区分着bytes和unicode,比如seek的用法,在py2和py3里都是一个个字节的seek,
#     但在py3里你就必须声明好了f的类型是rb,不允许再模糊.

#建议: 以后再读写文件的时候直接用rb模式,需要decode的时候仔显示地去解码.

9.4 with语句

为了避免打开文件后忘记关闭，可以通过管理上下文，即：

with open('log','r') as f:
        pass

如此方式，当with代码块执行完毕时，内部会自动关闭并释放文件资源。

在Python 2.7 后，with又支持同时对多个文件的上下文进行管理，即：

with open('log1') as obj1, open('log2') as obj2:
    pass

程序练习　　

程序1: 实现简单的shell sed替换功能

程序2:修改haproxy配置文件

需求：

1、查
    输入：www.oldboy.org
    获取当前backend下的所有记录

2、新建
    输入：
        arg = {
            'bakend': 'www.oldboy.org',
            'record':{
                'server': '100.1.7.9',
                'weight': 20,
                'maxconn': 30
            }
        }

3、删除
    输入：
        arg = {
            'bakend': 'www.oldboy.org',
            'record':{
                'server': '100.1.7.9',
                'weight': 20,
                'maxconn': 30
            }
        }

需求

global       
        log 127.0.0.1 local2
        daemon
        maxconn 256
        log 127.0.0.1 local2 info
defaults
        log global
        mode http
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms
        option  dontlognull

listen stats :8888
        stats enable
        stats uri       /admin
        stats auth      admin:1234

frontend oldboy.org
        bind 0.0.0.0:80
        option httplog
        option httpclose
        option  forwardfor
        log global
        acl www hdr_reg(host) -i www.oldboy.org
        use_backend www.oldboy.org if www

backend www.oldboy.org
        server 100.1.7.9 100.1.7.9 weight 20 maxconn 3000

原配置文件

6. 字符编码与转码

详细文章:

http://www.cnblogs.com/yuanchenqi/articles/5956943.html

http://www.diveintopython3.net/strings.html

需知:

1.在python2默认编码是ASCII, python3里默认是unicode

2.unicode 分为 utf-32(占4个字节),utf-16(占两个字节)，utf-8(占1-4个字节)， so utf-16就是现在最常用的unicode版本，不过在文件里存的还是utf-8，因为utf8省空间

3.在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

上图仅适用于py2

#-*-coding:utf-8-*-
__author__ = 'Alex Li'

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)

in python2

#-*-coding:gb2312 -*-   #这个也可以去掉
__author__ = 'Alex Li'

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
#msg_gb2312 = msg.decode("utf-8").encode("gb2312")
msg_gb2312 = msg.encode("gb2312") #默认就是unicode,不用再decode,喜大普奔
gb2312_to_unicode = msg_gb2312.decode("gb2312")
gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")

print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)
print(gb2312_to_utf8)

in python3

字符串是以单引号'或双引号"括起来的任意文本，比如'abc'，"123"等等。

请注意，''或""本身只是一种表示方式，不是字符串的一部分，因此，字符串'abc'只有a，b，c这3个字符。如果'本身也是一个字符，那就可以用""括起来，比如"I'm OK"包含的字符是I，'，m，空格，O，K这6个字符。

2.1 创建字符串：

var1 = 'Hello World!'
var2 = "Python RAlvin"

对应操作：

# 1   * 重复输出字符串
print('hello'*2)
 
# 2 [] ,[:] 通过索引获取字符串中字符,这里和列表的切片操作是相同的,具体内容见列表
print('helloworld'[2:])
 
# 3 in  成员运算符 - 如果字符串中包含给定的字符返回 True
print('el' in 'hello')
 
# 4 %   格式字符串
print('alex is a good teacher')
print('%s is a good teacher'%'alex')
 
 
# 5 +   字符串拼接
a='123'
b='abc'
c='789'
d1=a+b+c
print(d1)
# +效率低,该用join
d2=''.join([a,b,c])
print(d2)

发表于 2017-12-05 16:05 bird'linux 阅读(85) 评论(0) 收藏举报

Python基础二 数据类型和文件操作

1. 列表、元组操作

列表生成式

深浅拷贝

元组

程序练习

2. 字符串操作

二 字符串类型（string）

Python内置方法

三 字节类型(bytes)

注意

3. 字典操作

八 集合(set)

1、创建集合

2、访问集合

3、更新集合

四、集合类型操作符

5. 文件操作

现有文件如下：

9.2 文件打开模式

9.3 文件具体操作

9.4 with语句

6. 字符编码与转码

2. 字符串操作　

二字符串类型（string）

三字节类型(bytes)

八集合(set)

四、集合类型操作符　

9.2 文件打开模式