Python语言精要---上

下面的记录根据：

麦金尼. 利用Python进行数据分析[M]. 机械工业出版社, 2014.

这本教材的附录部分总结而来

Python的设计特点是重视可读性，简洁性以及明确性

Python不推荐多个语句写在一行，不够简洁

Python中真正是万物皆对象，数值，字符串，数据结构，函数，类，模块等都是Python对象

a = [1,2,3]

其实是创建右侧对象的一个应用

b = a

其实不是数据的复制，而是引用的复制，导致现在a和b都是指向这个数据对象的

a.append(4)

之后

b变成了

[1,2,3,4]

所以赋值（assignment）操作也叫做绑定（binding），那么赋值过的变量名也称为绑定变量（bound variable）

所以Python的数据传递默认是按照引用传递的

Python某种程度上虽然随意赋值，但是变量都是有类型的，只是类型信息保存在自己的内部

Python是一种强类型语言

isinstance(a,int)#可以检查一个对象是否是特定类型的实例

instance(a,(int,float))#检查一个对象是否是一个元组中指定的那些

Python中的对象通常都有

属性attribute即存储在对象内部的其他Python对象

方法method与改对象有关的能够访问其内部数据的函数

调用方法除了a.b之外还可以用一些函数

a = 'foo'

print(getattr(a, 'upper'))

b = a.upper()

print(a)

print(b)

输出：

<built-in method upper of str object at 0x0000004448779B58>

foo

FOO

判断对象时候可以迭代

def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError:
        return False
print(isiterable('a string'))
print(isiterable([1,2,3]))
print(isiterable(5))

最后一个不可以，前面的两个都可以

#检查对象是不是列表或者数组，要是不是的话，就转换成list
x=1,2,3
print(x)
if not isinstance(x, list) and isiterable(x):
    x = list(x)
print(x)

输出：

(1, 2, 3)

[1, 2, 3]

module是包含函数和其变量定义的模块

我们可以用import关键字引入到另一个.py文件中

整体引入就这样写：

import some_module

部分指定引入就这样写：

from some_module inmport f,g,PI

整体/部分引入且指定引入就要这样写：

import some_module as sm

from some_module import PI as pi_const, g as g_function

运算符的比较：

a=[1,2,3,4]
b = a
print(a is b)   #这是创建的引用，是指向同一个对象，所以是True
c = list(a)     #会创建新的列，所以a与c不指向相同的列
print(a is not c)   
print(a ==c)    #但是a与c的内容完全相同
a = None
print(a is None)#这是True的因为，None的实例只有一个

上面的四个都是True

急性子：

a=b=c=5
d=a+b*c
print(d)

Python是一种急性子的严格的语言，几乎在任何时候，计算过程和表达式都是立即求值的

上面的例子中也是先计算b*c然后计算加上a的值

有些Python技术例如迭代器和生成器可以实现延时计算，在执行负荷高的计算时候可以延时计算，先完成代码

可变与不可变对象：

Python对象的大部分是可变的，如列表，字典，数组和大部分自定义的数据类型

可变就是说包含的对象或值是可以被修改的

a_list = ['foo',1,[12,34]]
a_list[2] = (5,6)
print(a_list)       #['foo', 1, (5, 6)]

其他的字符串和元组是不可变的：

a_tuple = (1,2,3,(4,5))
a_tuple[1] = 'four'     #TypeError: 'tuple' object does not support item assignment

标量类型：

None	是Python中的null,这个对象只存在一个实例对象
str	字符串
unicode	Unicode字符串
float	双精度浮点数
bool	True或者False
int	有符号整数，最大值由平台决定
long	任意精度有符号整数，大的int会自动转换为long

cval = 1+2j
print(cval*(1-2j)) #(5+0j)

a = 'first way to define string'
b = "second way to define string"
c = """
thid way to 
define a string
"""
print(a)
print(b)
print(c)
'''
first way to define string
second way to define string
thid way to 
define a string
'''

Python字符串是不可修改的

a = "this is an immutable object"
print(a[11])
#a[10] = "f" #TypeError: 'str' object does not support item assignment
b = a.replace('object', "string object")
print(b)
'''
i
this is an immutable string object

字符串的其他操作：

a = 5.6
s = str(a)
print(s)
s = 'Python Spark'
s_new = list(s)
print(s_new)
print(s[:3])
print(s_new[:3])
s1 = '12\\34'
print(s1)
s2 = r'this\is \no\sprcial characters \n'
print(s2)
print(s1+s2)
'''
5.6
['P', 'y', 't', 'h', 'o', 'n', ' ', 'S', 'p', 'a', 'r', 'k']
Pyt
['P', 'y', 't']
12\34
this\is \no\sprcial characters \n
12\34this\is \no\sprcial characters \n

None:

def add_and_maybe_multiply(a,b,c=None):
    result = a+b
    if c is not None:
        result = result * c
    return result

这个函数中常见的默认值可以用None去填充

但是None不是一个保留关键字

它只是NoneType的一个实例而已

时间和日期操作：

from datetime import datetime,date,time
dt = datetime(2016,8,24,19,22,33)
print(dt.year)
print(dt.month)
print(dt.day)
print(dt.hour)
print(dt.minute)
print(dt.second)
print(dt.date())
print(dt.time())
print(dt.strftime('%m/%d/%Y %H:%M:%S'))
dt_new = dt.replace(minute = 13, second = 23)
print(dt_new)
dt_sub = dt - dt_new
print(dt_sub)
print(type(dt_sub))
dt_recover = dt + dt_sub
print(dt_recover)
print(type(dt_recover))
'''
2016
8
24
19
22
33
2016-08-24
19:22:33
08/24/2016 19:22:33
2016-08-24 19:13:23
0:09:10
<class 'datetime.timedelta'>
2016-08-24 19:31:43
<class 'datetime.datetime'>
'''

控制流：

a = 1; b = 7;
c = 8; d = 4;
if(a <b or c > d):
    print("Made it")

这种and或者or组成的复合条件的求值是从左面到右面的，而且是短路类型的

seq = [1,2,None,4,None,6]
total = 0
for value in seq:
    if value is None:
        continue
    total += value

continue的作用就是在循环中跳出本次迭代

关键字pass代表空操作，可以放在没有任何的功能的地方，站位

异常处理：

f = open(path,'w')
try:
    write_to_file(f)
except (TypeError,ValueError):
    print('Failed')
else:
    print("Successed")
finally:
        f.close()

try去尝试

except加括号表示只处理这些异常，不加的话处理所有的异常

finally:是无论try成功失败都要执行的收尾工作的地方

range：

Python3总range始终返回迭代器

print(range(10))
print(range(0,20,2))
seq = [1,2,3,4]
for i in range(len(seq)):
    print(seq[i])

输出：

range(0, 10)

range(0, 20, 2)

三元表达式：

value = true-expr if condition else false-expr

例如：

x = 5
print('Non-negative' if x >=0 else 'negative')

数据结构和序列：

Python的数据结构简单而强大

元组： tuple

是一种一维的，定长的，不可变的对象和序列

tuple = 4,5,6
print(tuple)

下面创建由元组组成的元组：

nested_tuple = (4,5,6),(7,8)
print(nested_tuple)     #((4, 5, 6), (7, 8))
print(nested_tuple[0])  #(4,5,6)
print(nested_tuple[0][2])#6
tup = tuple + nested_tuple
print(tup)              #(4, 5, 6, (4, 5, 6), (7, 8))
print(tup*2)            #(4, 5, 6, (4, 5, 6), (7, 8), 4, 5, 6, (4, 5, 6), (7, 8))
#这里元素不可变，一旦创建每个槽的元素不能再修改了。上面的加倍其实只是对象的引用加倍，本身不会被复制

元组拆包：

tuple = (1,2,(3,4))
a,b,c = tuple
print(a)
print(c)
d,e,(f,g) = tuple
print(f)
print(g)
'''
1
(3, 4)
3
4
'''

这样给值非常的方便，a,b = b,a其实就完成的值的交换

元组方法：

tuple = (1,2,2,2,2,5,6,7,8)
print(tuple.count(2))

上面的方法计算指定的值在元组中出现的次数，4

列表： list

列表是可变长度的，内容也是可以修改的

可以通过[]或者是List函数对列表进行定义

a_list = [2,3,5,7,None]
print(a_list)
tuple = ('jason','peggy','thea')
b_list = list(tuple)
print(b_list)
b_list[2] = "cathy"
print(b_list)
'''
[2, 3, 5, 7, None]
['jason', 'peggy', 'thea']
['jason', 'peggy', 'cathy']
'''

添加或者移除元素：

b_list.append(a_list)
print(b_list)
b_list.insert(3, 'thea')
print(b_list)
print(b_list.pop(4))
print(b_list)
b_list.append("thea")
print(b_list)
b_list.append("peggy")
print(b_list)
b_list.remove("peggy")  #按照值的移除，移除掉第一个出现的位置
print(b_list)
print('peggy' in b_list)
'''
['jason', 'peggy', 'cathy', [2, 3, 5, 7, None]]
['jason', 'peggy', 'cathy', 'thea', [2, 3, 5, 7, None]]
[2, 3, 5, 7, None]
['jason', 'peggy', 'cathy', 'thea']
['jason', 'peggy', 'cathy', 'thea', 'thea']
['jason', 'peggy', 'cathy', 'thea', 'thea', 'peggy']
['jason', 'cathy', 'thea', 'thea', 'peggy']
True
'''

合并列表：

print(a_list)
print(b_list)
c_list = a_list+b_list
print(c_list)
d_list = c_list.extend([7,8])
print(d_list)   
#extend耗费的资源更小，因为+是创建新的list将原来的拷贝过去，而extend 只是添加到现有列表
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
everything  =[]
for chunk in list_of_lists:
    everything = everything = chunk

这里我没弄出来，首先这个extend方法不可用

其次这个会把list中的所有的元素变成单个的字符串，可能是我对list_of_lists的设置有问题

排序：

a = [6,3,2,6,7,2,8,9,3,2]
print(a.sort(key=None, reverse=False))
b = ['charles','jason','peggy','thea']
print(b.sort(key=len, reverse=False))

不知道为什么这两个函数的执行结果都是None

二分搜索以及维护有序列表：

内置的bisect模块实现了二分查找以及对有序列表的插入操作

bisect.bisect找出新元素应该被插入到哪个位置才能保证原列表的有序性

bisect.insort将新元素插入到那个位置上去

import bisect
c = [1,2,3,4,5,6,8,9]
print(bisect.bisect(c,7))
bisect.insort(c,7)
print(c)
'''
6
[1, 2, 3, 4, 5, 6, 7, 8, 9]
'''

切片:

通过切片标记法，可以得到序列类型的子集

seq = [4,6,2,6,889,2,57,2,1,57,223]
print(seq[1:5])
seq[1:2]  =[99,99]
print(seq)
print(seq[:5])
print(seq[3:])
print(seq[-4:])
print(seq[-6:-2])
print(seq[::2])     #每两个取一个
print(seq[::-1])    #反序
'''
[6, 2, 6, 889]
[4, 99, 99, 2, 6, 889, 2, 57, 2, 1, 57, 223]
[4, 99, 99, 2, 6]
[2, 6, 889, 2, 57, 2, 1, 57, 223]
[2, 1, 57, 223]
[2, 57, 2, 1]
[4, 99, 6, 2, 2, 57]
[223, 57, 1, 2, 57, 2, 889, 6, 2, 99, 99, 4]
'''

内置的序列函数：

enumerate

for i,value in enumerate(collection):

pass

这个函数可以返回序列的（i,value）元组

这个函数还有一个使用方式，可以求取一个序列值映射到其所在的位置的字典

some_list = ['charles','peggy','jason']
mapping = dict((v,i) for i,v in enumerate(some_list))
print(mapping)
print(sorted(zip(mapping.keys(),mapping.values()))) #反转排序
print(sorted(zip(mapping.values(),mapping.keys()))) #反转排序
print(sorted([7,2,4,76,9,3,1]))                     #排序
print(sorted('hello world'))                        #排序
print(sorted(set('hello world')))   #返回唯一元素组成的列表
'''
{'charles': 0, 'jason': 2, 'peggy': 1}
[('charles', 0), ('jason', 2), ('peggy', 1)]
[(0, 'charles'), (1, 'peggy'), (2, 'jason')]
[1, 2, 3, 4, 7, 9, 76]
[' ', 'd', 'e', 'h', 'l', 'l', 'l', 'o', 'o', 'r', 'w']
[' ', 'd', 'e', 'h', 'l', 'o', 'r', 'w']
'''

ZIP：

可以将多个序列中的元素配对，从而产生一个新的元组列表

print("\n")
seq_1 = ['charles','jason','peggy','thea']
seq_2 = ['one','two','three','four']
seq_3 = ['first','second']
print(sorted(zip(seq_1,seq_2)))
print(sorted(zip(seq_1,seq_2,seq_3)))       #按照最短的序列决定
#enumerate和zip结合使用，迭代多个序列
for i, (a,b) in enumerate(zip(seq_1,seq_2)):
    print("%d: %s, %s" % (i,a,b))
#对于已经要压缩好的数据，可以用*解压缩
zipped = zip(seq_1,seq_2)
name, seq = zip(*zipped)
print(name)
print(seq)
#*相当于zip(seq[0],seq[1],...,seq[len(seq)-1])
print(list(reversed(range(10))))
'''
[('charles', 'one'), ('jason', 'two'), ('peggy', 'three'), ('thea', 'four')]
[('charles', 'one', 'first'), ('jason', 'two', 'second')]
0: charles, one
1: jason, two
2: peggy, three
3: thea, four
('charles', 'jason', 'peggy', 'thea')
('one', 'two', 'three', 'four')
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
'''

字典：

也可以成为是hash map或者关联数组

是一种大小可变的键值对集

其中的key和value都是Python对象

创建字典的方式是{，}

empty_dict = {}
print(empty_dict)
#创建非空字典
d_1 = {'a':'hello world','b':[1,2,3,4]}
print(d_1)
#插入元素
d_1[7] = 'insert an integer'
print(d_1)
print(d_1['b'])#按照key取值
print('b' in d_1)#判断key是否在映射中
#q取出两个列表
print(d_1.keys())
print(d_1.values())
#删除刚才插入的元素
del d_1[7]
print(d_1)
#弹出一个key对应的东西，返回值是弹出的value
ret = d_1.pop('b')
print(ret)
print(d_1)
#刷新字典
d_1.update({'b':'fresh','a':'change that'})
print(d_1)
'''
{}
{'b': [1, 2, 3, 4], 'a': 'hello world'}
{'b': [1, 2, 3, 4], 'a': 'hello world', 7: 'insert an integer'}
[1, 2, 3, 4]
True
dict_keys(['b', 'a', 7])
dict_values([[1, 2, 3, 4], 'hello world', 'insert an integer'])
{'b': [1, 2, 3, 4], 'a': 'hello world'}
[1, 2, 3, 4]
{'a': 'hello world'}
{'b': 'fresh', 'a': 'change that'}
'''

从序列类型创建字典：

其实字典是两个序列中的元素对应的二元的元组集

所以：

mapping  = dict(zip(range(5),reversed(range(5))))
print(mapping)
#{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

默认值：

print("\n")
#get方法可以类似一个if-else循环
value = mapping.get(1,None)
print(value)
no_value = mapping.get(5,"nothing here")
print(no_value)
names = ['charles','apple','thea','jason']
by_latter = {}
for name in names:
    letter = name[0]
    if letter not in by_latter:
        by_latter[letter] = [name]
    else:
        by_latter[letter].append(name)
print(by_latter)
#上面的这个模块也可以用setdefault代替
by_latter_new = {}
for name in names:
    letter = name[0]
    by_latter_new.setdefault(letter,[]).append(name)
print(by_latter_new)
#collections模块有一个叫做defaultdict的类,可以使得这个过程更加简单
from collections import defaultdict
by_letter_fresh = defaultdict(list)
for name in names:
    by_letter_fresh[name[0]].append(name)
print(by_letter_fresh)
'''
3
nothing here
{'c': ['charles'], 't': ['thea'], 'j': ['jason'], 'a': ['apple']}
{'c': ['charles'], 't': ['thea'], 'j': ['jason'], 'a': ['apple']}
defaultdict(<class 'list'>, {'c': ['charles'], 't': ['thea'], 'j': ['jason'], 'a': ['apple']})
'''

字典键的有效类型：

字典的值可以是任何的Python对象

但是键必须是不可变的对象

可以使用hash函数，判断这个对象时候是可hash的也就是可以用作字典的键

print('\n')
print(hash('string'))
print(hash((1,2,(3,4))))
#print(hash((1,2,[3,4])))#这里会失败，因为列表是可变的
#要想使用列表做键，最简单的放哪个是就是转换成元组
d = {}
d[tuple([1,2,3])] = 5
print(d)
'''
-5979933153692547881
-2725224101759650258
{(1, 2, 3): 5}
'''

集合：

是唯一元素组成的无序集

可以用Set也可以用{}创建

a_set = set([2,3,4,5,6,8,9,43])
print(a_set)
b_set = {2,3,5,7,89,5,3,1,3,5,6}
print(b_set)
print( a_set | b_set)#并集
print( a_set & b_set)#交集
print( a_set - b_set)#差集
print( a_set ^ b_set)#异或
c_set = {2,3,4}
print(c_set.issubset(a_set))#c是a的子集不是b的子集
print(c_set.issubset(b_set))
print(a_set == b_set)#内容相同就相等

其他的集合操作:

a.add(x)	元素x加入到集合a中
a.remove(x)	元素x从集合a中删除
a.union(b)	求合集
a.intersection(b)	求并集
a.difference(b)	a-b差集
a.symmetric_difference(b)	异或集合
a.issubset(b)	a是b的子集，为True
a.issuperset(b)	b是a的子集，为True
a.isdisjoint(b)	没有公共元素，为True

列表，集合以及字典的推导式：

列表推导式是Python受欢迎的语言特性之一

[expr for val in collection if condition]

names = ['thea','jason','apple','charles','heater','white']
print([x.upper() for x in names if len(x) > 4])

集合推导式也是类似：

set_comp = {expr for value in condition if condition}

字典推导式也是类似：

dict_comp = {key-expr : value-expr for value in collection if condition}

这几种推倒式都是语法组合一块，但是确实可以使得代码更加容易读懂。

print("\n")
names = ['thea','jason','apple','charles','heater','white']
print([x.upper() for x in names if len(x) > 4])
unique_set = {len(x) for x in names}
print(unique_set)
unique_dict = {val:index for index,val in enumerate(names)}
print(unique_dict)
unique_dict_new = dict((val,idx) for idx,val in enumerate(names))
print(unique_dict_new)
'''
['JASON', 'APPLE', 'CHARLES', 'HEATER', 'WHITE']
{4, 5, 6, 7}
{'heater': 4, 'white': 5, 'apple': 2, 'thea': 0, 'jason': 1, 'charles': 3}
{'heater': 4, 'white': 5, 'apple': 2, 'thea': 0, 'jason': 1, 'charles': 3}
'''

嵌套列表推导式：

print("\n")
#下面是一个列表的列表
name_name = [
             ['tom','jason','charles','lilei','shown','joe'],
             ['natasha','thea','ana','eva','peggy','heather']
             ]
#想要找到含有两个a的名字放到新的一个列表中
name = [name for names in name_name for name in names if name.count('a')>=2]
print(name)
#为了更清除这种嵌套，下面是一个例子
some_tuples = [(1,2,3),(4,5,6),(7,8,9)]
flattered = [x for tup in some_tuples for x in tup]
print(flattered)
'''
['natasha', 'ana']
[1, 2, 3, 4, 5, 6, 7, 8, 9]
'''

其实这些推导式都可以换成是循环，循环中的for循环顺序和这些推倒式中的for循环的顺序是一样的。

flat = []
for tup in some_tuples:
    for x in tup:
        flat.append(x)
print(flat)

来自为知笔记(Wiz)

posted @ 2016-08-25 00:05 kongchung 阅读(597) 评论(0) 收藏举报

刷新页面返回顶部

Python语言精要---上

公告