Python学习日记（八）—— 模块一（sys、os、hashlib、random、time、RE）

模块，用一砣代码实现了某个功能的代码集合。

类似于函数式编程和面向过程编程，函数式编程则完成一个功能，其他代码用来调用即可，提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来，可能需要多个函数才能完成（函数又可以在不同的.py文件中），n个 .py 文件组成的代码集合就称为模块。

如：os 是系统相关的模块；file是文件操作相关的模块

模块分为三种：

自定义模块
第三方模块
内置模块

使用模块

导入模块

Python之所以应用越来越广泛，在一定程度上也依赖于其为程序员提供了大量的模块以供使用，如果想要使用模块，则需要导入。导入模块有一下几种方法：

import module
from module.xx.xx import xx
from module.xx.xx import xx as rename 
from module.xx.xx import *

导入模块其实就是告诉Python解释器去解释那个py文件

导入一个py文件，解释器解释该py文件
导入一个包，解释器解释该包下的 __init__.py 文件【py2.7】

那么问题来了，导入模块时是根据那个路径作为基准来进行的呢？即：sys.path

import sys
for i in sys.path:
    print(i)

结果：
C:\Users\Sullivan\PycharmProjects\q1\day11      #pycharm自己添加的
C:\Users\Sullivan\PycharmProjects\q1                #pycharm自己添加的
C:\python36\python36.zip
C:\python36\DLLs
C:\python36\lib
C:\python36
C:\python36\lib\site-packages

如果sys.path路径列表没有你想要的路径，可以通过 sys.path.append('路径') 添加。

import sys
sys.path.append("D:")    #把D盘当做路径

模块

内置模块是Python自带的功能，在使用内置模块相应的功能时，需要【先导入】再【使用】

一、sys

用于提供对Python解释器相关的操作：

#sys模块和python解释器进行交互
sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdin          输入相关
sys.stdout         输出相关
sys.stderror       错误相关

#argv
print(sys.argv)
结果:
C:\Users\Sullivan\PycharmProjects\q1\day10>python module-sys.py
['module-sys.py']

C:\Users\Sullivan\PycharmProjects\q1\day10>python module-sys.py 1 2
['module-sys.py', '1', '2']

#platform
print(sys.platform)
结果:
win32   

#exit是可以打印东西的
#sys.exit("Goodbye!")

#往屏幕上打东西,和print不一样
sys.stdout.write("hello")   #不会自动换行,所以能在一行里显示
print("hello")                   #会自动换行
sys.stdout.write("hello")
结果:
hellohello           #可以看到sys.stdout.write没有输出换行符,所以print输出的hello紧跟在后面
hello                  #因为print输出hello后加了换行符,所以hello显示在了下一行

几个例子

实例：带百分比的进度条

import sys,time

for i in range(31):
    sys.stdout.write('\r')  #每一次都会清空本行
    sys.stdout.write("%s%% | %s" % (int(i/30*100) , int(i/30*100)*'*'))
    sys.stdout.flush()      #强制刷新到屏幕
    time.sleep(0.3)

二、OS

用于提供系统级别的操作：

os.getcwd()                 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")         改变当前脚本工作目录；相当于shell下cd
os.curdir                   返回当前目录: ('.')
os.pardir                   获取当前目录的父目录字符串名：('..')
os.makedirs('dir1/dir2')    可生成多层递归目录
os.removedirs('dirname1')   若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')         生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')         删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')       列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove()                 删除一个文件
os.rename("oldname","new")  重命名文件/目录
os.stat('path/filename')    获取文件/目录信息
os.sep                      操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep                  当前平台使用的行终止符，win下为"\r\n",Linux下为"\n"
os.pathsep                  用于分割文件路径的字符串
os.name                     字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")   运行shell命令，直接显示
os.environ                  获取系统环境变量
os.path.abspath(path)       返回path规范化的绝对路径
os.path.split(path)         将path分割成目录和文件名二元组返回
os.path.dirname(path)       返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)      返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)        如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)         如果path是绝对路径，返回True
os.path.isfile(path)        如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)         如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)      返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)      返回path所指向的文件或者目录的最后修改时间

#stat
print(os.stat('joel'))
info = os.stat('joel')    #产看文件大小
print(info.st_size)
print(info.st_atime)     #最后一次访问的时间,显示的是时间戳
print(info.st_mtime)    #修改文件的时间

#system -- 输出shell命令,就不用调用控制台输入了
#print(os.system("dir"))

#路径拼接，很重要记得用
os.path.join([a,b])

几个实例

三、hashlib

用于加密相关的操作，代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

hash = hashlib.md5()                                        
hash.update(bytes('123',encoding='utf-8'))
print(hash.hexdigest())

结果：
202cb962ac59075b964b07152d234b70

以上加密算法虽然依然非常厉害，但时候存在缺陷，即：通过撞库可以反解。所以，有必要对加密算法中添加自定义key再来做加密，俗称加盐。　　

import hashlib   
                                
hash = hashlib.md5(bytes('joel-love-ellie',encoding='utf-8'))
hash.update(bytes('123',encoding='utf-8'))
print(hash.hexdigest())

结果：
178172ac856c2dae457bdb731229d01c

其它的加密算法

import hashlib
######## sha1 ########
 
hash = hashlib.sha1()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
# ######## sha256 ########
 
hash = hashlib.sha256()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
 
# ######## sha384 ########
 
hash = hashlib.sha384()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
# ######## sha512 ########
 
hash = hashlib.sha512()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())

四、random

import random
 
print(random.random())        　         生成一个随机数 0-1之间的小数
print(random.randint(1, 2))   　　　      左边什么时候都包括,包括右边的值（2）
print(random.randrange(1, 10))   　　　   不包括右边的值（10）
print(random.choice('hello',))           从提供的字符串中随机取一个
print(random.choice(['ciri','ellie']))   随机取列表中的一个值

随机验证码

#方法一：
import random

def captcha_code():
    code = ''
    for i in range(6):
        add = random.choice([random.randrange(10),chr(random.randrange(65,91))])
　　　　 #从随机的数字和字母中随机选一个
        code = code + str(add)
    print(code)

captcha_code()


#方法二：
import random
checkcode = ''
for i in range(4):
    current = random.randrange(0,4)　　　　#当随机数和i的值相同时输出字母，不同时输出数字
    if current != i:
        temp = chr(random.randint(65,90))
    else:
        temp = random.randint(0,9)
    checkcode += str(temp)
print checkcode

五、time模块

时间相关的操作，时间有三种表示方式：

时间戳 1970年1月1日之后的秒，即：time.time()
格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d')
结构化时间元组包含了：年、日、星期等... time.struct_time 即：time.localtime()

print time.time()
print time.mktime(time.localtime())
   
print time.gmtime()    #可加时间戳参数
print time.localtime() #可加时间戳参数
print time.strptime('2014-11-11', '%Y-%m-%d')
   
print time.strftime('%Y-%m-%d') #默认当前时间
print time.strftime('%Y-%m-%d',time.localtime()) #默认当前时间
print time.asctime()
print time.asctime(time.localtime())
print time.ctime(time.time())
   
import datetime
'''
datetime.date：表示日期的类。常用的属性有year, month, day
datetime.time：表示时间的类。常用的属性有hour, minute, second, microsecond
datetime.datetime：表示日期时间
datetime.timedelta：表示时间间隔，即两个时间点之间的长度
timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]]]]])
strftime("%Y-%m-%d")
'''
import datetime
print datetime.datetime.now()
print datetime.datetime.now() - datetime.timedelta(days=5)

    %Y  Year with century as a decimal number.
    %m  Month as a decimal number [01,12].
    %d  Day of the month as a decimal number [01,31].
    %H  Hour (24-hour clock) as a decimal number [00,23].
    %M  Minute as a decimal number [00,59].
    %S  Second as a decimal number [00,61].
    %z  Time zone offset from UTC.
    %a  Locale's abbreviated weekday name.
    %A  Locale's full weekday name.
    %b  Locale's abbreviated month name.
    %B  Locale's full month name.
    %c  Locale's appropriate date and time representation.
    %I  Hour (12-hour clock) as a decimal number [01,12].
    %p  Locale's equivalent of either AM or PM.

格式化占位符

六、RE模块

什么是正则表达式（Regular Expression，简称RE）？

正则表达式 -- 本身就是一门语言(所以它也有自己的语法)，比较短小，在Python中，通过RE模块调用。

1、基础知识

元字符： . ^ $ * + ? {} [] | () \

	" . " -- 通配符,一个"."只能匹配一个字符
	" ^ " -- 尖角符,开头匹配,控制开头(在字符集里还有个特殊的意义)
	" $ " -- dollar符,末尾匹配,控制结尾
	" * " -- 重复0到多次
	" + " -- 重复1到多次
	" ? " -- 重复0到1次
	"{} " -- 想重复几次重复几次	
	"[] " -- 字符集,会取消元字符的特殊功能
	" | " -- 管道符,或
	"() " -- 做分组用的
	" \ " -- 反斜杠后面跟元字符,去除特殊功能
	　　　　 　反斜杠后面跟普通字符实现特殊功能(只是一部分,并不是所有的)				
	　　　　　 引用序号对应的字组所匹配的字符串

反斜杠 " \ "后面加普通字符实现的特殊功能

***大写的字母都表示 非 的意思***		
\d -- 匹配任何十进制数:它相当于类[0-9]
\D -- 匹配任何非数字字符:它相当于类[^0-9]
\s -- 匹配任何空白字符:它相当于类[ \t\n\r\f\v]　　#有空格，最前面是个空格
\S -- 匹配任何非空白字符:它相当于类[^ \t\n\r\f\v] #有空格，尖角号后面是个空格
\w -- 匹配任何字母数字字符:它相当于类[a-zA-Z0-9_]
\W -- 匹配任何非字母数字字符:它相当于类[^a-zA-Z0-9_]
\b -- 匹配一个单词边界（特殊边界）,也就是指单词和空格间的位置

2、re模块的内置功能

match　

# match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None
 
 
 match(pattern, string, flags=0)
 # pattern： 正则模型
 # string ： 要匹配的字符串
 # falgs  ： 匹配模式
     X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
     I  IGNORECASE  Perform case-insensitive matching.忽略大小写
     M  MULTILINE   "^" matches the beginning of lines (after a newline)
                    as well as the string.
                    "$" matches the end of lines (before a newline) as well
                    as the end of the string.
     S  DOTALL      "." matches any character at all, including the newline.
 
     A  ASCII       For string patterns, make \w, \W, \b, \B, \d, \D
                    match the corresponding ASCII character categories
                    (rather than the whole Unicode categories, which is the
                    default).
                    For bytes patterns, this flag is the only available
                    behaviour and needn't be specified.
      
     L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
     U  UNICODE     For compatibility only. Ignored for string patterns (it
                    is the default), and forbidden for bytes patterns.

为何要有分组？提取匹配成功的内容的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）

origin = "ciri prime deborah ellie joel"

# 无分组
r = re.match("c\w+", origin)
print(r.group())        # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组结果
结果：
ciri
()
{}

# 有分组
r = re.match("(c)(\w+)", origin)
print(r.group())         # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组中所有执行了key的组
结果：
ciri            #不管对于分组还是不分组,group没有区别,因为都是拿所有的内容
('c', 'iri')    #有几个括号放进几个到元组中
{}

r = re.match("(?P<n1>c)(\w+)" , origin)    #?P<n1>给组起一个名
print(r.groupdict())
结果：
{'n1': 'c'}

search

# search,浏览整个字符串去匹配第一个，未匹配成功返回None
# search(pattern, string, flags=0)

有分组&无分组　　

origin = "ciri prime deborah ellie joel"

# 无分组
r = re.search("c\w+", origin)
print(r.group())        # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组结果
结果：
ciri
()
{}

# 有分组
r = re.search("(c)(\w+)", origin)
print(r.group())         # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组中所有执行了key的组
结果：
ciri           
('c', 'iri')    
{}

r = re.search("(?P<n1>c)(\w+)" , origin)    #?P<n1>给组起一个名
print(r.groupdict())
结果：
{'n1': 'c'}

findall

# findall，获取非重复的匹配列表
#如果有一个组则以列表形式返回，且每一个匹配均是字符串
#如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖

# 空的匹配也会包含在结果中
#findall(pattern, string, flags=0)

有分组&无分组

origin = "ciri cirila prime deborah ellie joel"
#无分组
r = re.findall("c\w+" , origin)
print(r)
结果：
['ciri', 'cirila']

#有分组
r = re.findall("(c\w+)" , origin)   #和上面是相同的,和不加括号一样
print(r)
r = re.findall("c(\w+)" , origin)   #这样findall里面就只有组里的东西
print(r)
r = re.findall("(c)(\w+)" , origin)
print(r)
结果：
['ciri', 'cirila']
['iri', 'irila']
[('c', 'iri'), ('c', 'irila')]

#findall就是把groups里面所有的东西都放到列表中去

findal补充：

　　findall其实就是一个一个的search,然后把groups里的结果组合起来,以列表的形式

n = re.findall("\d+\w\d+","a2b3c4d5")
print(n)        
结果：
['2b3', '4d5']
#从输出结果可以看出,findall在匹配过程中找到了就从下一个开始继续匹配

n = re.findall('','ciri')
print(n)        
结果：
['', '', '', '', '']
#多一个,因为末尾会有一个空值来进行匹配

n = re.findall('(\w)(\w)(\w)(\w)','ciri')
print(n)
结果：
[('c', 'i', 'r', 'i')]

n = re.findall('(\w){4}','ciri') 
print(n)
结果：
['i']

#匹配时还是会全部匹配，但是提取内容时,有4个分组,它不知道拿哪一个,默认就拿最后一个
#匹配是一回事,但分组(去匹配到的东西中取)就是有几个括号分几次组

n = re.findall('(\w)*','ciri')   
print(n)
结果：
['i', '']
#最后面还有一个什么都不是的,就由*为0时来匹配

sub

# sub，替换匹配成功的指定位置字符串
 
sub(pattern, repl, string, count=0, flags=0)
# pattern： 正则模型
# repl   ： 要替换的字符串或可执行对象
# string ： 要匹配的字符串
# count  ： 指定匹配个数
# flags  ： 匹配模式

无分组

origin = "ciri ciri ciri prime deborah ellie joel"

r = re.sub("c\w+" ,"666" , origin , 2)
print(r)
结果：
666 666 ciri prime deborah ellie joel

split

# split，根据正则匹配分割字符串
 
split(pattern, string, maxsplit=0, flags=0)
# pattern： 正则模型
# string ： 要匹配的字符串
# maxsplit：指定分割个数
# flags  ： 匹配模式

有分组&无分组

origin = "sulli ciri prime deborah ellie joel"
#无分组
r = re.split("ciri"  , origin , 1)
print(r)
结果：
['sulli ', ' prime deborah ellie joel']

#有分组
r = re.split("(ciri)"  , origin , 1)#会保留分组内的东西
print(r)
r = re.split("c(iri)"  , origin , 1)
print(r)
r = re.split("(ci(ri))"  , origin , 1)#有几个()输出几个
print(r)
结果：
['sulli ', 'ciri', ' prime deborah ellie joel']
['sulli ', 'iri', ' prime deborah ellie joel']
['sulli ', 'ciri', 'ri', ' prime deborah ellie joel']

IP：
^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手机号：
^1[3|4|5|8][0-9]\d{8}$
邮箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+

常用的正则表达式

未完待续。。。　

posted @ 2017-08-25 21:29 Igniculus 阅读(335) 评论(0) 收藏举报

刷新页面返回顶部

Igniculus

Size The Day

Python学习日记（八）—— 模块一（sys、os、hashlib、random、time、RE）

使用模块

模块

一、sys

二、OS

三、hashlib

四、random

五、time模块

六、RE模块

match

search

findall

sub

split

公告