python 第十课：正则表达式，模块 - 贾老板

公告

python 第十课：正则表达式，模块

一.正则表达式（re）

1.概念：是一种小型的，高度专业化的集成函数（在python中）它内嵌在python中，通过re模块实现。

2.功能

a.findall:以列表形式你定义的字符串。

re.findall('alex','yuandslncalexasdalex')

》》》['alex',alex]

b."."通配符。“^”字符串开头匹配。"$"字符串结尾匹配，“*”相邻字母0-无穷匹配

“+”邻字母1-无穷匹配."?"邻字母0-1。"{}"自定义个数eg:{1,5,3}

"[]"或的意思eg：re.findall("a[bc]d","wwwabd")输出>>>>[abd]

"[]"范围意思eg：re.findall("[a-z]","wwwabd")输出>>>>[w,w,w,a,b,d]

"[^]"除了（非）的意思eg：re.findall("[^a-z]","www14596abd")输出>>>>[1,4,5,9,6]

c.match以及分组(re.match从头开始匹配。re.search()浏览全部字符串，匹配第一个符合规则的字符串)

#无分组

import re

origin="hello,alex bad ada frcv alex alec alex 19"

r=re.match("h\w+",origin)

print(r.group())#获取匹配到的所有结果

print(r.groups())#获取匹配到的分组结果

print(r.groupdict())#获取匹配到的分组结果

有分组

import re

origin="hello,alex bad ada frcv alex alec alex 19"

r=re.match("h\w+",origin)/r=re.match("h\(w+)",origin)/r=re.match("(?p<n1>h)(?p<n2>\w+)",origin)

print(r.group())#获取匹配到的所有结果"hello"

print(r.groups())#获取匹配到的分组结果("ello")

print(r.groupdict())#获取匹配到的分组结果{"n1":"h","n2":"ello"}

I ignorecase忽略大小写

M multiline 多行查找

S dotall 包括所有的特殊字符，还包括点"."

d.findall(从头挨个找对应的字符串，找到一个之后，从他末尾接着找。空字符串也会输出，相当于进行好几次search)

findall输出的结果和groups一样

import re

origin="hello,alex bad ada frcv alex alec alex 19"

1 r=re.findall("(a)\w+)",origin)

2 r=re.findall("(a)(\w+(a))",origin)

print(r)

1.>>>>输出[("a","lex"),("a","da"),("a","lex"),("a","lec")]

2>>>>输出[("a","lex"),("a","da","a"),("a","lex"),("a","lec")]

大佐findall特殊例子

a="alex"

n=re.findall("(\w)\w(\w)(\w)",a)

print(n)

n=re.findall("(\w)*",a)

print(n)

>>>>[("a","l","e","x")]

>>>>["x",""]

e.finditer(有group，groups，groupdict，方法通过迭代方法输出)

f.sub（替换）

eg：origin="1sdv2sdv56641vsv"

new_str=re.sub("\d+","yyyy"origin,1)

print(new_str)

>>>>>输出yyyysdv2sdv56641vsv

g.subn（替换后返回替换值和替换次数）

eg：origin="1sdv2sdv56641vsv"

new_str，count=re.sub("\d+","yyyy"origin)

print(new_str，count)

>>>>>输出yyyysdv2sdv56641vsv

二模块

2.time，timedate模块

 1 #_*_coding:utf-8_*_
 2 __author__ = 'Alex Li'
 3 
 4 import time
 5 
 6 
 7 # print(time.clock()) #返回处理器时间,3.3开始已废弃 , 改成了time.process_time()测量处理器运算时间,不包括sleep时间,不稳定,mac上测不出来
 8 # print(time.altzone)  #返回与utc时间的时间差,以秒计算\
 9 # print(time.asctime()) #返回时间格式"Fri Aug 19 11:14:16 2016",
10 # print(time.localtime()) #返回本地时间 的struct time对象格式
11 # print(time.gmtime(time.time()-800000)) #返回utc时间的struc时间对象格式
12 
13 # print(time.asctime(time.localtime())) #返回时间格式"Fri Aug 19 11:14:16 2016",
14 #print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上
15 
16 
17 
18 # 日期字符串 转成  时间戳
19 # string_2_struct = time.strptime("2016/05/22","%Y/%m/%d") #将 日期字符串 转成 struct时间对象格式
20 # print(string_2_struct)
21 # #
22 # struct_2_stamp = time.mktime(string_2_struct) #将struct时间对象转成时间戳
23 # print(struct_2_stamp)
24 
25 
26 
27 #将时间戳转为字符串格式
28 # print(time.gmtime(time.time()-86640)) #将utc时间戳转换成struct_time格式
29 # print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将utc struct_time格式转成指定的字符串格式
30 
31 
32 
33 
34 
35 #时间加减
36 import datetime
37 
38 # print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925
39 #print(datetime.date.fromtimestamp(time.time()) )  # 时间戳直接转成日期格式 2016-08-19
40 # print(datetime.datetime.now() )
41 # print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
42 # print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天
43 # print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
44 # print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分
45 
46 
47 #
48 # c_time  = datetime.datetime.now()
49 # print(c_time.replace(minute=3,hour=2)) #时间替换

3.pickle模块

pickle.loads把字符文件输出为字典文件

pickle.dumps把字典内容写入文件（以字符写入文件）。

import pickle

acc_file_name="account.db"

account_file=open(acc_file_name,"rb")

account_dic=pickle.loads(account_file.read())

account_dic=pickle.load(account_file)

account_file.close()

account_dic[1000]["balance"]-=500

f=open(acc_file_name,"wb")

f.write(pickle.dumps(account_dic))

#pickle.dump(acccount_dic,f)

f.close()

print(account_dic)

JSON只能处理基本数据类型。pickle能处理所有Python的数据类型。

JSON用于各种语言之间的字符转换。pickle用于Python程序对象的持久化或者Python程序间对象网络传输，但不同版本的Python序列化可能还有差异。

4.os模块

5.hashlib

impory hashlib

ooo=hash.lib.md5(bytes("dascf",encoding="utf-8"))

000.update(bytes("123",encoding="utf-8"))

print(ooo.hexdigest())

eg:

 1 import hashlib
 2 def md5(ex):
 3  ooo=hashlib.md5(bytes("dascf",encoding="utf-8"))
 4 
 5  ooo.update(bytes("ex",encoding="utf-8"))
 6 
 7  return ooo.hexdigest()
 8 
 9 def login(usr,pwd):
10     with open("bd","r",encoding="utf-8") as f:
11         u,p=f.strip().splite()
12         if u == usr and p==md5(pwd):
13             return True
14 def register(usr,pwd):
15     with open("bd", "a", encoding="utf-8") as f:
16         temp = usr + "|" + md5(pwd)
17         f.write(temp)
18 inp=input("1.登录：2.注册")
19 inp=str(inp)
20 if inp=="1":
21   inp_name=input("请输入用户名：")
22   inp_pwd=input("请输入密码：")
23   r=login(inp_name,inp_pwd)
24   if r:
25       print("登陆成功")
26   else:
27       print("登录失败")
28 elif inp=="2":
29     inp_name = input("请输入用户名：")
30     inp_pwd = input("请输入密码：")
31     t = register(inp_name, inp_pwd)
32     if t:
33         print("注册成功")

二：字符串格式化

1.%

占位符，格式化数据

tp1=“i am %s“ %"alex"

tp2="i am %s age %d" % ("alex",18)

tp3="i am %(name)s age %(age)d" % {"name":"alex","age":18}

tp4="percent %.2f" % {"pp": 123.425556}

tp5="i am %(pp).2f" % {"pp":123.4452355}

tp6="i am %.2f %%" % {"pp":123.456132213,}

2.format

2.模块补充

_doc_#py文件的注释

_file_#当前文件路径

_package_#当前文件None

#导入的其他文件：指定文件所在包，，用"."分隔

_cached_#缓存

_name_#如果是主文件，_name_=="_main-"否则，等于模块名

from lib import s1

from lib import s2

def execute()

print("执行了")

s1.f1()

s2.fi()

#execute()

#只有执行python index.py 时，_name_=="_main_",否则，模块名

#_name_=index

if _name_="_main_":

execute()

1~~主文件

调用主函数前，必须加 if _name_=="_main_"；

2.~~_file_

#当前文件路径

#print(os.path.dirname(_file_)

eg:import os,sys

temp=os.path.dirname(__file__)

b="bin"

new_path=os.path.join(temp,b)

sys.path.append(new_path)

一句话解决 sys.path.append(os.path.join(os.path.dirname(__file__),bin))

#返回上一级目录

sys.path

os.path.join：拼接两个目录

os.path.dirname：当前文件的上一级目录

3.内置函数在__builtins__

三。json模块

1.json.loads()用于将字典，列表，元组形式的字符串，装换成相应的字典，列表，元组（字符串转字典，内部必须双引号）

json.load() 功能：eg：json.load(open('db','r '))

2.json.dumps()功能：用于将python的python基本数据类型转换成字符串

json.dump() 功能：eg：dic={'k1':123,'k2':'v2'}

json.dump(dic,open('db','w'))

五。随机插入

方式一pip3

python2，，python3共存时

python2/3 -m pip install XXXX

方式二

源码下载

下载》》》》》解压》》》》进入目录python setup.py install

四requests（第三方模块）

作用：用py模拟浏览器浏览网页。（发起http请求，获取请求返回值第一种JSON,第二种XML，第三种HTML）

eg：

import requests

response=requests.get("http://www.weather.com.cn/adat/sk/101010500.html")

response.encoding="utf-8"

result=response.text

print(result)

通过#json.loads(result)就可以拿到字典结果

 1 r=requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=644549378")
 2 result=r.text#字符串类型
 3 from xml.etree import ElementTree as ET
 4 #解析XML格式内容
 5 #XML接收一个参数：字符串，格式化为一个特殊的参数
 6 node=ET.XML(result)
 7 #获取内容
 8 if node.text=="Y":
 9      print("在线")
10 else:
11     print("不在线")

eg：

1 import requests
2 from xml.etree import ElementTree as ET
3 #解析XML格式内容
4 r=requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=")
5 result=r.text#字符串类型
6 #XML接收一个参数：字符串，格式化为一个特殊的参数
7 node=ET.XML(result)#把一个字符串解析成XML
8 for r in node.iter("TrainDetailInfo"):
9     print(r.find("TrainStation").text,r.find("StartTime").text)

iter（找子子孙孙），find（找孩子）

from XML.etree import ElementTree as ET

root=ET.XML(open("first.xml","r",encoding="utf-8").read())

print(root.tag)#读取root的首标签

for node in root:

print(node,type(node))

print(node.tag,node.attrib,node.find("rank").text)

五.1

0.解析一个字符串与requests合作输出

1.xml第一种打开方式

1 from xml.etree import ElementTree as ET
2 #解析XML格式内容
3 node=ET.XML(open("first.xml","r",encoding="utf-8").read())
4 for r in node.iter("country"):
5     print(r.find("rank").text,r.find("year").text)

2.XML的另一种文件打开方式（区别可以增删内容）

 1 from xml.etree import ElementTree as ET
 2 #打开并解析文件内容
 3 tree=ET.parse("first.xml")#parse解析并打开文件
 4 root=tree.getroot()#获取根节点
 5 for node in root.iter("year"):#循环year节点，iter迭代
 6     new_year=int(node.text)+1#自增1
 7     node.text=str(new_year)
 8     node.set('name','alex')
 9     node.set('age','19')#用set设置属性
10     del node.attrib['name']#del删除（用nodeattrib属性打开的name）
11     node.
12 tree.write('first.xml')

五.2.创建XML

方式一

 1 from xml.etree import ElementTree as ET
 2 #创建一个根node
 3 new_xml=ET.Element('namelist')
 4 
 5 
 6 #创建子节点
 7 name1 = ET.SubElement(new_xml,'name',attrib={'enrolled':'yes'})
 8 age1 = ET.SubElement(name1,'age',attrib={'checked':'no'})
 9 sex1=ET.SubElement(name1,'sex')
10 sex1.text="33"
11 
12 name2 = ET.SubElement(new_xml,'name',attrib={'enrolled':'no'})
13 age2 = ET.SubElement(name2,'age',)
14 age2.text="19"
15 
16 et=ET.ElementTree(new_xml)
17 et.write('test.xml',encoding="utf-8",xml_declaration=True)

 1 #创建子节点方式二
 2 from xml.etree import ElementTree as ET
 3 #打开文件并解析
 4 
 5 tree=ET.parse('first.xml')
 6 #赋值根节点
 7 root_node=tree.getroot()
 8 
 9 #创建子节点color
10 c = root_node.makeelement('color',{'xiaobai':'white'})
11 
12 w=root_node.makeelement('fsdfaf',{"ads":"19"})
13 
14 root_node.append(c)
15 
16 root_node.append(w)
17 tree.write('first.xml')

创建节点三

1 c = ET.Element('PP',{'xiaobai':'white'})
2 
3 w=ET.Element('PP',{"ads":"19"})

六，XML的节点类（element）的功能

tag：当前节点的标签名。

attrib：当前节点的属性。

text：当前节点的内容

makeelement（slef，tag，attrib）：在当前节点创建新节点（只创建，）

tree.write("文件名"，"编码方式"，"xml_declaration=ture"#声明)

七.xml自动缩进功能代码

1 def MyWrite(root,file_path):
2     rough_string=ET.tostring(root,"utf-8")
3     raparsed=minidom.parseString(rough_string)
4     new_str=raparsed.toprettyxml(indent="\t")
5     f=open(file_path,'w',encoding="utf-8")
6     f.write(new_str)
7     f.close()

posted on 2018-01-29 22:18 贾老板阅读(203) 评论(0) 收藏举报

刷新页面返回顶部

老男孩的替身

公告