模块
模块,是用一段代码实现了某个功能的代码集合,本质上是一个py文件。
类似于函数式编程和面向过程编程,函数式编程则完成一个功能,其他代码用来调用即可,提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来,可能需要多个函数才能完成(函数又可以在不同的.py文件中),n个 .py 文件组成的代码集合就称为模块。
如:os 是系统相关的模块;file是文件操作相关的模块
模块分为三种:
- 内置模块
- 第三方模块
- 自定义模块
模块导入规范
- 文件头部导入
- 内置模块放在最开始
- 第三方模块放在中间
- 自定义模块放在最后
- 不同种类模块之间空一行
自定义模块
定义模块

导入模块
Python之所以应用越来越广泛,在一定程度上也依赖于其为程序员提供了大量的模块以供使用,如果想要使用模块,则需要导入。导入模块有一下几种方法:
1 import module
2 from module.xx.xx import xx
3 from module.xx.xx import xx as rename
4 from module.xx.xx import *
导入模块其实就是告诉Python解释器去解释那个py文件
- 导入一个py文件,解释器解释该py文件
- 导入一个包,解释器解释该包下的 __init__.py 文件
那么问题来了,导入模块时是根据那个路径作为基准来进行的呢?即:sys.path
In [62]: import sys
In [63]: sys.path
Out[63]:
['','D:\\pytho3.6\\Scripts\\ipython.exe','D:\\pytho3.6','d:\\pytho3.6\\python36.zip','d:\\pytho3.6\\DLLs','d:\\pytho3.6\\lib','d:\\pytho3.6\\lib\\site-packages','d:\\pytho3.6\\lib\\site-packages\\proxy_pool-1.0.0-py3.6.egg','d:\\pytho3.6\\lib\\site-packages\\proxypool-2.0.0-py3.6.egg','d:\\pytho3.6\\lib\\site-packages\\bs4-0.0.1-py3.6.egg','d:\\pytho3.6\\lib\\site-packages\\retrying-1.3.3-py3.6.egg','d:\\pytho3.6\\lib\\site-packages\\win32','d:\\pytho3.6\\lib\\site-packages\\win32\\lib','d:\\pytho3.6\\lib\\site-packages\\Pythonwin','d:\\pytho3.6\\lib\\site-packages\\IPython\\extensions','C:\\Users\\fei\\.ipython']
如果sys.path路径列表没有你想要的路径,可以通过 sys.path.append('路径') 添加。
例如:将当前文件的目录的上一层目录添加到环境路径中
import os,sys
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(BASE_DIR)
print(sys.path)

第三方模块
- 下载安装:
|
1
2
3
4
5
|
方式一:yum(linux),pip,apt-get(linux)方式二:下载源码,解压源码,进入目录 编译源码 python setup.py build 安装源码 python setup.py install方式三:下载whl文件,进入目录pip install 文件名.whl |
- 使用国内镜像源安装
默认pip是使用Python官方的源,但是由于国外官方源经常被墙,导致不可用,我们可以使用国内的python镜像源,从而解决Python安装不上库的烦恼。
|
1
2
3
|
网上有很多可用的源,例如:豆瓣:http://pypi.douban.com/simple/清华:https://pypi.tuna.tsinghua.edu.cn/simple |
1. 临时使用
|
1
|
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple django |
2. 永久修改,一劳永逸
|
1
2
3
4
5
6
7
|
Linux下,修改 ~/.pip/pip.conf (没有就创建一个), 修改 index-url至tuna,内容如下: [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple windows下,直接在user目录中创建一个pip目录,如:C:\Users\xx\pip,新建文件pip.ini,内容如下: [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple |
安装成功后,模块会自动安装到 sys.path 中的某个目录中,如
|
1
|
'D:\\python3\\lib\\site-packages', |
- 导入模块:同自定义模块的导入方式
实例:安装scrapy
|
1
2
3
4
5
6
7
|
scrapy是一款强大的爬虫框架,最大的特点就是异步和分布式。由于scrapy是依赖Twisted,所以需要首先安装Twisted, pip install twisted或者 下载对应版本的whl文件 pip install Twisted-18.4.0-cp36-cp36m-win_amd64.whl安装完成之后pip install scrapy |
然后我们在命令行python环境中import scrapy,如果没报错就是安装成功了。
- 模块的卸载
|
1
|
pip uninstall 模块名 |
内置模块
一、os
os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径
os.chdir("dirname") 改变当前脚本工作目录;相当于shell下cd
os.curdir 返回当前目录: ('.')
os.pardir 获取当前目录的父目录字符串名:('..')
os.makedirs('dirname1/dirname2') 可生成多层递归目录
os.removedirs('dirname1') 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
os.mkdir('dirname') 生成单级目录;相当于shell中mkdir dirname
os.rmdir('dirname') 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
os.listdir('dirname') 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
os.remove() 删除一个文件
os.rename("oldname","newname") 重命名文件/目录
os.stat('path/filename') 获取文件/目录信息
os.sep 输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
os.linesep 输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
os.pathsep 输出用于分割文件路径的字符串
os.name 输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command") 运行shell命令,直接显示
os.environ 获取系统环境变量
os.path.abspath(path) 返回path规范化的绝对路径
os.path.split(path) 将path分割成目录和文件名二元组返回
os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path) 返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False
os.path.isabs(path) 如果path是绝对路径,返回True
os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False
os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间
更多点击这里
二、sys
sys.argv 命令行参数List,第一个元素是程序本身路径
sys.exit(n) 退出程序,正常退出时exit(0)
sys.version 获取Python解释程序的版本信息
sys.maxint 最大的Int值
sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
sys.platform 返回操作系统平台名称
sys.stdout.write('please:')
val = sys.stdin.readline()[:-1]
三、hashlib
hashlib 是一个提供了一些流行的hash算法的 Python 标准库.其中所包括的算法有 md5, sha1, sha224, sha256, sha384, sha512. 另外,模块中所定义的 new(name, string=”) 方法可通过指定系统所支持的hash算法来构造相应的hash对象.
from hashlib import md5, sha1, sha224, sha256, sha384, sha512
from pprint import pprint
import hashlib
hash_funcs = [md5, sha1, sha224, sha256, sha384, sha512]
def hash_show(s):
result = []
for func in hash_funcs:
s_hash_obj = func(s.encode())
s_hash_hex = s_hash_obj.hexdigest()
result.append((s_hash_obj.name, s_hash_hex, len(s_hash_hex)))
return result
if __name__ == '__main__':
#1.各hashlib算法的使用案例
s = 'hello python'
rs = hash_show(s)
pprint(rs)
#2.md5的使用案例
m1 = md5() # 构造hash对象
m1.update('hello'.encode())
m1.update(' '.encode())
m1.update('python'.encode())
m2 = md5('hello python'.encode())
print(m1.hexdigest() == m2.hexdigest()) # 两种方式的效果相同
#3.使用 new(name, string=”) 构造新的哈系对象
h = hashlib.new('ripemd160', 'hello python'.encode()) # ripemd160是一个160位的hash算法. ripemd系列算法基于md4, md5.
print(h.hexdigest(),len(h.hexdigest()))
以上加密算法虽然依然非常厉害,但时候存在缺陷,即:通过撞库可以反解。所以,有必要对加密算法中添加自定义key再来做加密。
import hashlib
# ######## md5 ########
hash = hashlib.md5('898oaFs09f')
hash.update('admin')
print(hash.hexdigest())
除此之外,python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密
import hmac
h = hmac.new('wueiqi')
h.update('hellowo')
print(h.hexdigest())
四、json 和 pickle
用于序列化的两个模块
- json,用于字符串 和 python数据类型间进行转换
- pickle,用于python特有的类型 和 python的数据类型间进行转换
Json模块提供了四个功能:dumps、dump、loads、load
pickle模块提供了四个功能:dumps、dump、loads、load
json.dumps (dict -> str)
将 python 对象编码转化为 json 字符串
json.loads (str -> dict)
将 json 格式转化为 python dict格式
json.dump
dump 和 dumps 的功能一样,将 dict 转化为 str 的格式,然后存入文件中。
json.load
load 和 loads 的功能一样,从文件中读取 str 格式并将其转化为json。
import json
data = {
'Citizen_Wang':1,
'Always fall in love':2,
'with neighbours.':3
}
json_str = json.dumps(data)
#print('Python 原始数据是:', data)
#print('json 对象:',json_str)
print(type(data))
print(type(json_str))
date2 = json.loads(json_str)
print(type(json_str))
print(type(date2))
#转化后的 json_str 是一个字符串类型,可以使用 enumerate 来查看一下 json_str 字符串的索引值和对应值。
file = open('2.txt', 'wb')
print(type(file))
json.dump(data, file)
json.dump(json_str, file)
"""
json字符串
1.将json字符串json.loads转化为字典
2.用key值提取value
"""
json_str = '{"demo":{"demo1":{"demo2":{"demo3":"result"}}}}'
print(type(json_str))
str1 = json.loads(json_str)
print(type(str1))
str1['demo']['demo1']['demo2']['demo3']
#load 在读取的时候,只需要文件对象一个参数即可。而 dump 在使用的时候,需要 python 对象作为第一个传入参数,文件对象作为第二个传入参数。
"""
小结:
json数据类型的字符串必须用单引号包起来,里面是双引号
dump 或 dumps, 把其他对象或者格式,转化为 json str 格式,
dumps 用来处理字符串,dump 用来处理文件。
load 或 loads ,把 json str 格式转化成其他格式,
loads 用来处理字符串,load 用来处理文件。
注:json.loads(response.text) == response.json()
"""
pickle可以存储什么类型的数据呢?
所有python支持的原生类型:布尔值,整数,浮点数,复数,字符串,字节,None。
由任何原生类型组成的列表,元组,字典和集合。
函数,类,类的实例
pickle模块可能出现三种异常:
1. PickleError:封装和拆封时出现的异常类,继承自Exception
2. PicklingError: 遇到不可封装的对象时出现的异常,继承自PickleError
3. UnPicklingError: 拆封对象过程中出现的异常,继承自PickleError
"""dumps"""
import pickle
list = ["aa","bb","cc"]
#4 # dumps 将数据通过特殊的形式转换为只有python语言认识的字符串
pickle_str = pickle.dumps(list)
print(pickle_str)
"""
loads
将pickle数据转换为python的数据结构
"""
py_str = pickle.loads(pickle_str)
print(py_str)
"""dump功能
dump 将数据通过特殊的形式转换为只有python语言认识的字符串,并写入文件
"""
with open('pickle.txt', 'wb') as f:
pickle.dump(list, f)
"""
load: 从数据文件中读取数据,并转换为python的数据结构
"""
with open('pickle.txt', 'rb') as f:
data = pickle.load(f)
print(data)
json和pickle的区别:
pickle可以在python之间进行交互
json可以实现python与不同开发语言的交互
pickle可以序列化python中的任何数据类型
json只能序列化python中的常归数据类型(列表等)
pickle序列化后的对象不可读
json序列化后的对象是可读的
五、执行系统命令
可以执行shell命令的模块有:
os.system()
In [69]: import os
In [70]: os.system('ipconfig')
Windows IP 配置
无线局域网适配器 本地连接* 2:
媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . :
以太网适配器 Bluetooth 网络连接:
媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . :
无线局域网适配器 WLAN:
连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::3c9d:abc4:35f7:2c3f%4
IPv4 地址 . . . . . . . . . . . . : 172.25.1.226
子网掩码 . . . . . . . . . . . . : 255.255.254.0
默认网关. . . . . . . . . . . . . : 172.25.1.254
以太网适配器 以太网:
媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . :
以太网适配器 VMware Network Adapter VMnet1:
连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::5cf0:3457:e440:4dc%8
IPv4 地址 . . . . . . . . . . . . : 192.168.202.1
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . :
以太网适配器 VMware Network Adapter VMnet8:
连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::40d1:14ec:f3e1:97b2%9
IPv4 地址 . . . . . . . . . . . . : 192.168.137.1
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . :
Out[70]: 0
In [71]: os.system('ver')
Microsoft Windows [版本 6.3.9600]
Out[71]: 0
In [72]: os.system('time')
当前时间: 11:46:29.71
输入新时间:
Out[72]: 0
subprocess模块提供了丰富的命令行功能
subprocess.Popen(...)
用于执行复杂的系统命令
参数:
-
- args:shell命令,可以是字符串或者序列类型(如:list,元组)
- bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
- stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
- preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
- close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。 所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
- shell:同上
- cwd:用于设置子进程的当前目录
- env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
- universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
- startupinfo与createionflags只在windows下有效 将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等
实例:socket实现简单的命令行交互
from socket import *
ip_port = ('127.0.0.1',8000)
buffer_size = 1024
backlog = 5
tcp_client = socket(AF_INET,SOCK_STREAM)
tcp_client.connect(ip_port)
while True:
cmd = input('>>:').strip()
if not cmd:
continue
if cmd == 'quit':
break
tcp_client.send(cmd.encode('utf-8'))
#解决粘包
length = tcp_client.recv(4)
length = struct.unpack('i',length)[0]
recv_size = 0
recv_msg = b''
while recv_size < length:
recv_msg += tcp_client.recv(buffer_size)
recv_size = len(recv_msg)
print(recv_msg.decode('gbk'))
from socket import *
import subprocess
import struct
ip_port = ('127.0.0.1', 8000)
buffer_size = 1024
backlog = 5
tcp_server = socket(AF_INET, SOCK_STREAM)
tcp_server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) #就是它,在bind前加
tcp_server.bind(ip_port)
tcp_server.listen(backlog)
while True:
conn, addr = tcp_server.accept()
print('新的客户端链接:', addr)
while True:
try:
cmd = conn.recv(buffer_size)
print('收到客户端命令:', cmd.decode('utf-8'))
#执行命令cmd,得到命令的结果cmd_res
res = subprocess.Popen(cmd.decode('utf-8'),shell=True,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
)
err = res.stderr.read()
if err:
cmd_res = err
else:
cmd_res = res.stdout.read()
if not cmd_res:
cmd_res = '执行成功'.encode('gbk')
#解决粘包
length = len(cmd_res)
data_length = struct.pack('i',length)
conn.send(data_length)
conn.send(cmd_res)
except Exception as e:
print(e)
break
conn.close()
六、shutil
高级的 文件、文件夹、压缩包 处理模块
shutil.copyfileobj(fsrc, fdst[, length]) 将文件内容拷贝到另一个文件中,可以部分内容
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
shutil.copyfile(src, dst) 拷贝文件
def copyfile(src, dst):
"""Copy data from src to dst"""
if _samefile(src, dst):
raise Error("`%s` and `%s` are the same file" % (src, dst))
for fn in [src, dst]:
try:
st = os.stat(fn)
except OSError:
# File most likely does not exist
pass
else:
# XXX What about other special files? (sockets, devices...)
if stat.S_ISFIFO(st.st_mode):
raise SpecialFileError("`%s` is a named pipe" % fn)
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
shutil.copymode(src, dst) 仅拷贝权限。内容、组、用户均不变
def copymode(src, dst):
"""Copy mode bits from src to dst"""
if hasattr(os, 'chmod'):
st = os.stat(src)
mode = stat.S_IMODE(st.st_mode)
os.chmod(dst, mode)
shutil.copystat(src, dst) 拷贝状态的信息,包括:mode bits, atime, mtime, flags
def copystat(src, dst):
"""Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
st = os.stat(src)
mode = stat.S_IMODE(st.st_mode)
if hasattr(os, 'utime'):
os.utime(dst, (st.st_atime, st.st_mtime))
if hasattr(os, 'chmod'):
os.chmod(dst, mode)
if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
try:
os.chflags(dst, st.st_flags)
except OSError, why:
for err in 'EOPNOTSUPP', 'ENOTSUP':
if hasattr(errno, err) and why.errno == getattr(errno, err):
break
else:
raise
shutil.copy(src, dst) 拷贝文件和权限
def copy(src, dst):
"""Copy data and mode bits ("cp src dst").
The destination may be a directory.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst)
copymode(src, dst)
shutil.copy2(src, dst) 拷贝文件和状态信息
def copy2(src, dst):
"""Copy data and all stat info ("cp -p src dst").
The destination may be a directory.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst)
copystat(src, dst)
shutil.ignore_patterns(*patterns) shutil.copytree(src, dst, symlinks=False, ignore=None) 递归的去拷贝文件
例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))
def ignore_patterns(*patterns):
"""Function that can be used as copytree() ignore parameter.
Patterns is a sequence of glob-style patterns
that are used to exclude files"""
def _ignore_patterns(path, names):
ignored_names = []
for pattern in patterns:
ignored_names.extend(fnmatch.filter(names, pattern))
return set(ignored_names)
return _ignore_patterns
def copytree(src, dst, symlinks=False, ignore=None):
"""Recursively copy a directory tree using copy2().
The destination directory must not already exist.
If exception(s) occur, an Error is raised with a list of reasons.
If the optional symlinks flag is true, symbolic links in the
source tree result in symbolic links in the destination tree; if
it is false, the contents of the files pointed to by symbolic
links are copied.
The optional ignore argument is a callable. If given, it
is called with the `src` parameter, which is the directory
being visited by copytree(), and `names` which is the list of
`src` contents, as returned by os.listdir():
callable(src, names) -> ignored_names
Since copytree() is called recursively, the callable will be
called once for each directory that is copied. It returns a
list of names relative to the `src` directory that should
not be copied.
XXX Consider this example code rather than the ultimate tool.
"""
names = os.listdir(src)
if ignore is not None:
ignored_names = ignore(src, names)
else:
ignored_names = set()
os.makedirs(dst)
errors = []
for name in names:
if name in ignored_names:
continue
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
try:
if symlinks and os.path.islink(srcname):
linkto = os.readlink(srcname)
os.symlink(linkto, dstname)
elif os.path.isdir(srcname):
copytree(srcname, dstname, symlinks, ignore)
else:
# Will raise a SpecialFileError for unsupported file types
copy2(srcname, dstname)
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error, err:
errors.extend(err.args[0])
except EnvironmentError, why:
errors.append((srcname, dstname, str(why)))
try:
copystat(src, dst)
except OSError, why:
if WindowsError is not None and isinstance(why, WindowsError):
# Copying file access times may fail on Windows
pass
else:
errors.append((src, dst, str(why)))
if errors:
raise Error, errors
shutil.rmtree(path[, ignore_errors[, onerror]]) 递归的去删除文件
def rmtree(path, ignore_errors=False, onerror=None):
"""Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror
is set, it is called to handle the error with arguments (func,
path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
path is the argument to that function that caused it to fail; and
exc_info is a tuple returned by sys.exc_info(). If ignore_errors
is false and onerror is None, an exception is raised.
"""
if ignore_errors:
def onerror(*args):
pass
elif onerror is None:
def onerror(*args):
raise
try:
if os.path.islink(path):
# symlinks to directories are forbidden, see bug #1669
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, path, sys.exc_info())
# can't continue even if onerror hook returns
return
names = []
try:
names = os.listdir(path)
except os.error, err:
onerror(os.listdir, path, sys.exc_info())
for name in names:
fullname = os.path.join(path, name)
try:
mode = os.lstat(fullname).st_mode
except os.error:
mode = 0
if stat.S_ISDIR(mode):
rmtree(fullname, ignore_errors, onerror)
else:
try:
os.remove(fullname)
except os.error, err:
onerror(os.remove, fullname, sys.exc_info())
try:
os.rmdir(path)
except os.error:
onerror(os.rmdir, path, sys.exc_info())
shutil.move(src, dst) 递归的去移动文件
def move(src, dst):
"""Recursively move a file or directory to another location. This is
similar to the Unix "mv" command.
If the destination is a directory or a symlink to a directory, the source
is moved inside the directory. The destination path must not already
exist.
If the destination already exists but is not a directory, it may be
overwritten depending on os.rename() semantics.
If the destination is on our current filesystem, then rename() is used.
Otherwise, src is copied to the destination and then removed.
A lot more could be done here... A look at a mv.c shows a lot of
the issues this implementation glosses over.
"""
real_dst = dst
if os.path.isdir(dst):
if _samefile(src, dst):
# We might be on a case insensitive filesystem,
# perform the rename anyway.
os.rename(src, dst)
return
real_dst = os.path.join(dst, _basename(src))
if os.path.exists(real_dst):
raise Error, "Destination path '%s' already exists" % real_dst
try:
os.rename(src, real_dst)
except OSError:
if os.path.isdir(src):
if _destinsrc(src, dst):
raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
copytree(src, real_dst, symlinks=True)
rmtree(src)
else:
copy2(src, real_dst)
os.unlink(src)
shutil.make_archive(base_name, format,...)
创建压缩包并返回文件路径,例如:zip、tar
- base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径, 如:www =>保存至当前路径 如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/
- format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
- root_dir: 要压缩的文件夹路径(默认当前目录)
- owner: 用户,默认当前用户
- group: 组,默认当前组
- logger: 用于记录日志,通常是logging.Logger对象
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
dry_run=0, owner=None, group=None, logger=None):
"""Create an archive file (eg. zip or tar).
'base_name' is the name of the file to create, minus any format-specific
extension; 'format' is the archive format: one of "zip", "tar", "bztar"
or "gztar".
'root_dir' is a directory that will be the root directory of the
archive; ie. we typically chdir into 'root_dir' before creating the
archive. 'base_dir' is the directory where we start archiving from;
ie. 'base_dir' will be the common prefix of all files and
directories in the archive. 'root_dir' and 'base_dir' both default
to the current directory. Returns the name of the archive file.
'owner' and 'group' are used when creating a tar archive. By default,
uses the current owner and group.
"""
save_cwd = os.getcwd()
if root_dir is not None:
if logger is not None:
logger.debug("changing into '%s'", root_dir)
base_name = os.path.abspath(base_name)
if not dry_run:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
kwargs = {'dry_run': dry_run, 'logger': logger}
try:
format_info = _ARCHIVE_FORMATS[format]
except KeyError:
raise ValueError, "unknown archive format '%s'" % format
func = format_info[0]
for arg, val in format_info[1]:
kwargs[arg] = val
if format != 'zip':
kwargs['owner'] = owner
kwargs['group'] = group
try:
filename = func(base_name, base_dir, **kwargs)
finally:
if root_dir is not None:
if logger is not None:
logger.debug("changing back to '%s'", save_cwd)
os.chdir(save_cwd)
return filename
shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:
import zipfile
# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()
# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()
zipfile 压缩解压
import tarfile
# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()
# 解压
tar = tarfile.open('your.tar','r')
tar.extractall() # 可设置解压地址
tar.close()
tarfile 压缩解压
class ZipFile(object):
""" Class with methods to open, read, write, close, list zip files.
z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
file: Either the path to the file, or a file-like object.
If it is a path, the file will be opened and closed by ZipFile.
mode: The mode can be either read "r", write "w" or append "a".
compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
allowZip64: if True ZipFile will create files with ZIP64 extensions when
needed, otherwise it will raise an exception when this would
be necessary.
"""
fp = None # Set here since __del__ checks it
def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
"""Open the ZIP file with mode read "r", write "w" or append "a"."""
if mode not in ("r", "w", "a"):
raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
if compression == ZIP_STORED:
pass
elif compression == ZIP_DEFLATED:
if not zlib:
raise RuntimeError,\
"Compression requires the (missing) zlib module"
else:
raise RuntimeError, "That compression method is not supported"
self._allowZip64 = allowZip64
self._didModify = False
self.debug = 0 # Level of printing: 0 through 3
self.NameToInfo = {} # Find file info given name
self.filelist = [] # List of ZipInfo instances for archive
self.compression = compression # Method of compression
self.mode = key = mode.replace('b', '')[0]
self.pwd = None
self._comment = ''
# Check if we were passed a file-like object
if isinstance(file, basestring):
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = open(file, modeDict[mode])
except IOError:
if mode == 'a':
mode = key = 'w'
self.fp = open(file, modeDict[mode])
else:
raise
else:
self._filePassed = 1
self.fp = file
self.filename = getattr(file, 'name', None)
try:
if key == 'r':
self._RealGetContents()
elif key == 'w':
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
elif key == 'a':
try:
# See if file is a zip file
self._RealGetContents()
# seek to start of directory and overwrite
self.fp.seek(self.start_dir, 0)
except BadZipfile:
# file is not a zip file, just append
self.fp.seek(0, 2)
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
else:
raise RuntimeError('Mode must be "r", "w" or "a"')
except:
fp = self.fp
self.fp = None
if not self._filePassed:
fp.close()
raise
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def _RealGetContents(self):
"""Read in the table of contents for the ZIP file."""
fp = self.fp
try:
endrec = _EndRecData(fp)
except IOError:
raise BadZipfile("File is not a zip file")
if not endrec:
raise BadZipfile, "File is not a zip file"
if self.debug > 1:
print endrec
size_cd = endrec[_ECD_SIZE] # bytes in central directory
offset_cd = endrec[_ECD_OFFSET] # offset of central directory
self._comment = endrec[_ECD_COMMENT] # archive comment
# "concat" is zero, unless zip was concatenated to another file
concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
if endrec[_ECD_SIGNATURE] == stringEndArchive64:
# If Zip64 extension structures are present, account for them
concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
if self.debug > 2:
inferred = concat + offset_cd
print "given, inferred, offset", offset_cd, inferred, concat
# self.start_dir: Position of start of central directory
self.start_dir = offset_cd + concat
fp.seek(self.start_dir, 0)
data = fp.read(size_cd)
fp = cStringIO.StringIO(data)
total = 0
while total < size_cd:
centdir = fp.read(sizeCentralDir)
if len(centdir) != sizeCentralDir:
