Day11 Python学习笔记&关键注意点

14. 正则表达式

14.1. 概述

14.1.1. 概念

Regular Expression
一种文本模式，描述在搜索文本时要匹配的一个或多个字符串

14.1.2. 典型场景

数据验证
文本扫描
文本提取
文本替换
文本分割

14.1.3. 语法

字面值

• 普通字符

• 需转义

• \

• ^

• $

• .

• |

• ?

• *

• +

• ()

• []

• {}

元字符

14.1.4. 匹配

单字，预定义元字符

• . 除\n外的所有字符

• \d 数字，等同于[0-9]

• \D 费数字，等同于[^0-9]

• \s 空白字符 \t\n\r\f\v

• \S 非空白字符 [^\t\n\r\f\v]

• \w 字母数字字符[a-zA-Z0-9_]

• \W 非字母数字字符[^a-zA-Z0-9_]

批量备选

• | yes|no

量词（字符，元字符，字符集如何重复）

• ? 0或1次

• * 0或多次

• + 1或多次

• 特定

• {n,m} 范围次数

• {n} n次

• {n,} 至少n次

贪婪与非贪婪

• 贪婪（默认）：尽量匹配最大范围结果

• 非贪婪

• 尽量匹配最小的范围结果

• 方法：量词后追加？

• 举例：

• ??

• *?

• +?

边界匹配

• ^ 行首

• $ 行尾

• \b 单词边界

• \B 非单词边界

• \A 输入开头

• \Z 输入结尾

• 注：或因上下文差异有不同表现

14.2. Python正则

14.2.1. 模块

import re

14.2.2. RegexObject 正则对象

模式对象，表现编译后的正则表达式（编译为字节码并缓存）
编译

• re.compile(r'模式')

• 示例1

 1 import re  
 2 
 3 # Case 1: 
 4 text = "Tom is 8 years old. Mike is 25 years old."  
 5 
 6 # patten = re.compile('\d+') 
 7 # patten.findall(text)  
 8 
 9 # Same functions  
10 
11 # re.findall('\d+',text)

• 示例2

 1 import re    
 2 
 3 # Case 2:    
 4 
 5 s = "\\author:Tom"  
 6 patten = re.compile('\\author')  
 7 patten.findall(s)    
 8 
 9 # output is [] and the code needs to edit as follows:    
10 
11 s = "\\author:Tom"  
12 patten = re.compile('\\\\author')  
13 patten.findall(s)  
14 
15 # Output is ['\\author']

.findall()

• 查找所有非重叠匹配项

• 返回list

• 示例1

1 # Case 3:    
2 text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'  
3 patten = re.compile(r'\d+')  
4 patten.findall(text)    
5 
6 # Output is ['8', '23', '87']

• 示例2

1 # Case 4:    
2 
3 text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'  
4 p_name = re.compile(r'[A-Z]\w+')  
5 p_name.findall(text)    
6 
7 # Output is ['Tom', 'Mike', 'Peter']

.match(string[,pos[,endpos]])

• 匹配，仅从起始位置

• 返回 Match Object

• 示例1

1 # Case 5:  
2 
3 text = '<html><head></head><body></body></html>' 
4 pattern = re.compile(r'<html>') 
5 pattern.match(text)  
6 
7 # Output is <re.Match object; span=(0, 6), match='<html>'>

• 示例2：加了一个空格

# Case 6:  

text = ' <html><head></head><body></body></html>' #加了一个空格 
pattern = re.compile(r'<html>') 
pattern.match(text)  

# Output is blank

.search(string[,pos[,endpos]])

• 任意位置搜索

• 返回 Match Object

• 示例：加了一个空格

1 # Case 7:    
2 
3 text = ' <html><head></head><body></body></html>' #加了一个空格  
4 pattern = re.compile(r'<html>')  
5 pattern.search(text)    
6 
7 # Output is <re.Match object; span=(1, 7), match='<html>'>

.finditer()

• 查找所有匹配项

• 返回包括 Match Object元素的迭代器

• 示例：可对结果做遍历

 1 # Case 8:    
 2 
 3 text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'  
 4 p1 = re.compile(r'\d+')  
 5 it = p1.finditer(text)  
 6 for m in it:      
 7 print(m)    
 8 
 9 # Output is  
10 # <re.Match object; span=(7, 8), match='8'>  
11 # <re.Match object; span=(28, 30), match='23'>  
12 # <re.Match object; span=(51, 53), match='87'>

14.2.3. MatchObject 匹配对象

表现被匹配的模式
.group()

• 参数为0或空返回整个匹配

• 有参时返回特定分组匹配细节

• 参数也可以是分组名称

.groups()

• 返回包含所有子分组的元组

.start() 返回特定分组的起始索引
.end() 返回特定分组的终止索引
上述方法的示例1

 1 import re   
 2  
 3 text = "Tom is 8 years old. Jerry is 23 years old."  
 4 pattern = re.compile(r'\d+')  
 5 pattern.findall(text)  
 6 # Output is : ['8', '23']    
 7 
 8 pattern = re.compile(r'(\d+).*?(\d+)')  
 9 m = pattern.search(text)  
10 m  
11 # Output is : <re.Match object; span=(7, 31), match='8 years old. Jerry is 23'>    
12 
13 m.group()  
14 # Output is : '8 years old. Jerry is 23'    
15 
16 m.group(0)  
17 # Output is : '8 years old. Jerry is 23'    
18 
19 m.group(1)  
20 # Output is : '8'    
21 
22 m.group(2)  
23 # Output is : '23'    
24 
25 m.start(1)  
26 # Output is : 7    
27 
28 m.end(1)  
29 # Output is : 8    
30 
31 m.start(2)  
32 # Output is : 29    
33 
34 m.end(2)  
35 # Output is : 31    
36 
37 m.groups()  
38 # Output is : ('8', '23')

.span() 返回特定分组的起止索引元组
.groupdict() 以字典表形式返回分组名及结果
示例2

 1 # Case 2:  
 2 import re  
 3 pattern = re.compile(r'(\w+) (\w+)')  
 4 text = "Beautiful is better than ugly."  
 5 pattern.findall(text)  
 6 # Output is : [('Beautiful', 'is'), ('better', 'than')]    
 7 
 8 it = pattern.finditer(text)  
 9 for m in it:      
10 print(m.group())  
11 # Output is :   
12 # Beautiful is  
13 # better than

14.3. Group 编组

14.3.1. 场景

从匹配模式中提取信息
创建子正则以应用量词

• 示例

1 import re  
2 re.search(r'ab+c','ababc')  
3 #Output is : <re.Match object; span=(2, 5), match='abc'>    
4 
5 re.search(r'(ab)+c','ababc')  
6 #Output is : <re.Match object; span=(0, 5), match='ababc'>

限制备选项范围

• 示例

 1 re.search(r'Center|re','Center')  
 2 #Output is : <re.Match object; span=(0, 6), match='Center'>    
 3 
 4 re.search(r'Center|re','Centre')  
 5 #Output is : <re.Match object; span=(4, 6), match='re'> 
 6    
 7 re.search(r'Cent(er|re)','Centre')  
 8 #Output is : <re.Match object; span=(0, 6), match='Centre'> 
 9 
10 re.search(r'Cent(er|re)','Center')  
11 #Output is : <re.Match object; span=(0, 6), match='Center'>

重用正则模式中提取的内容

• 示例

1 re.search(r'(\w+)+ \1','hello world')  
2 #Output is : Blank    
3 
4 re.search(r'(\w+)+ \1','hello hello world')  
5 #Output is : <re.Match object; span=(0, 11), match='hello hello'>

14.3.2. 申明

（模式）
（?P<name>模式）

• 示例

 1 text = "Tom:98"  
 2 pattern = re.compile(r'(?P<name>\w+):(?P<score>\d+)')  
 3 m = pattern.search(text)  
 4 m.group()  
 5 #Output is : 'Tom:98'    
 6 
 7 m.group(1)  
 8 #Output is : 'Tom'    
 9 
10 m.group('name')  
11 #Output is : 'Tom'    
12 
13 m.group('score')  
14 #Output is : '98'

14.3.3. 引用

匹配对象内 m.group('name')
模式内（?P=name）
变现内 \g<name>

14.3.4. 应用

字符串操作

• .split(string, maxsplit=0)

• 分割字符串

• 示例

 1 import re    
 2 
 3 text = 'Beautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.'    
 4 
 5 p = re.compile(r'\n')  
 6 p.split(text)    
 7 
 8 #Output is :  
 9 #['Beautiful is better than ugly.',  
10 # 'Explicit is better than implicit.',  
11 # 'Simple is better than complex.']    
12 
13 re.split(r'\W','Good Morning')  
14 #Output is : ['Good', 'Morning']    
15 
16 re.split(r'\n',text,2)  
17 #Output is :  
18 #['Beautiful is better than ugly.',  
19 # 'Explicit is better than implicit.',  
20 # 'Simple is better than complex.']    
21 
22 re.split(r'\n',text,1)  
23 #Output is :  
24 #['Beautiful is better than ugly.',  
25 # 'Explicit is better than implicit.\nSimple is better than complex.']

• .sub(repl,string,count=0)

• 替换字符串

• 示例

 1 ords = 'ORD000\nORD001\nORD003'  
 2 re.sub(r'\d+','-',ords)  
 3 #Output is : 'ORD-\nORD-\nORD-'    
 4 
 5 re.sub(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)  
 6 #Output is :'000-ORD\n001-ORD\n003-ORD'    
 7 
 8 text = 'Beautiful is *better* than ugly.'  re.sub(r'\*(.*?)\*','<strong></strong>',text)  
 9 #Output is : 'Beautiful is <strong></strong> than ugly.'    
10 
11 re.sub(r'\*(.*?)\*','<strong>\g<1></strong>',text)  
12 #Output is : 'Beautiful is <strong>better</strong> than ugly.'

• .subn(repl,string,count=0)

• 替换并返回替换数量

• 示例

1 re.subn(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)  
2 #Output is : ('000-ORD\n001-ORD\n003-ORD', 3)

编译标记

• 改变正则的默认行为

• re.I 忽略大小写

• 示例

1 text = 'Python python PYTHON'  
2 re.search(r'python',text)  
3 #Output is : <re.Match object; span=(7, 13), match='python'>    
4 
5 re.findall(r'python',text)  
6 #Output is : ['python']    
7 
8 re.findall(r'python',text,re.I)  
9 #Output is : ['Python', 'python', 'PYTHON']

• re.M 匹配多行

• 示例

1 import re  
2 
3 re.findall(r'^<html>','\n<html>')  
4 #Output is : []    
5 
6 re.findall(r'^<html>','\n<html>',re.M)  
7 #Output is : ['<html>']

• re.S 指定“.”匹配所有字符，包括\n

• 示例

1 re.findall(r'\d(.)','1\ne')  
2 #Output is : []    
3 
4 re.findall(r'\d(.)','1\ne',re.S)  
5 #Output is : ['\n']

• ... ...

模块级别操作

• re.purge() 清理正则缓存

• 示例

re.purge()

• re.escape() 逃逸字符

• 示例

1 re.findall(r'^','^python^')  
2 #Output is : ['']    
3 
4 re.findall(re.escape('^'),'^python^')  
5 #Output is : ['^', '^']

15. 系统工具

15.1. 概念

15.1.1. 命令行工具

15.1.2. Shell脚本

15.1.3. 系统管理

15.2. 系统模块

15.2.1. sys

提供一组功能映射Python运行时的操作系统

15.2.2. os

提供跨平台可移植的操作系统编程接口
os.path 提供文件及目录工具的可移植编程接口

15.3. sys

15.3.1. 平台与版本

sys.platform
sys.version
sys.path
sys.modules

15.3.2. 观察异常细节

sys.exc_info() 获取最后一次异常细节

• 示例

1 import sys 
2 import traceback  
3 
4 try:     
5 raise KeyError 
6 except:     
7 print(sys.exc_info())  
8 
9 # Output is： (<class 'KeyError'>, KeyError(), <traceback object at 0x00000200332E6808>)

traceback.print_tb(sys.exc_info()[2])

• 示例

1 traceback.print_tb(sys.exc_info()[2]) 
2 # Output is:  File "<ipython-input-70-1bc1bfcd4795>", line 5, in <module> raise KeyError

15.3.3. 命令行参数

sys.argv

15.3.4. 标准流

sys.stdin 标准输入流默认等同于 input()
sys.stdout 标准输出流默认等同于 print()
sys.stderr 标准错误流

15.4. OS

15.4.1. Shell 变量

os.environ

15.4.2. 管理工具

.getcwd() 获取当前工作目录
.listdir(path) 列举目录内容
.chdir(path) 改变工作目录
.getpid() 获取当前进程ID
.getppid() 获取当前父进程ID

15.4.3. 运行shell命令

.system() Python脚本中运行shell命令
.popen() 运行命令并连接输入输出流

15.4.4. 文件处理

.mkdir('目录名') 创建目录
.rmdir('目录名') 删除目录
.rename('旧名','新名') 改名
.remove('文件名') 删除文件

15.4.5. 可移植工具

.sep 分隔符
.pathsep 路径分隔符
.curdir 相对当前目录符号
.pardir 相对上级目录符号

15.4.6. 路径模块 .path

.isdir(path) 是否目录
.isfile(path) 是否文件
.exists(path) 是否存在
.split(path) 拆分路径
.splitext(path) 拆分路径扩展名
.join() 连接路径
.normpath() 标准化路径
.abspath() 绝对化路径

posted on 2019-11-12 11:47 hemin96 阅读(131) 评论(0) 收藏举报

刷新页面返回顶部

hemin96

Day11 Python学习笔记&关键注意点

14. 正则表达式

14.1. 概述

14.1.1. 概念

14.1.2. 典型场景

14.1.3. 语法

14.1.4. 匹配

14.2. Python正则

14.2.1. 模块

14.2.2. RegexObject 正则对象

14.2.3. MatchObject 匹配对象

14.3. Group 编组

14.3.1. 场景

14.3.2. 申明

14.3.3. 引用

14.3.4. 应用

15. 系统工具

15.1. 概念

15.1.1. 命令行工具

15.1.2. Shell脚本

15.1.3. 系统管理

15.2. 系统模块

15.2.1. sys

15.2.2. os

15.3. sys

15.3.1. 平台与版本

15.3.2. 观察异常细节

15.3.3. 命令行参数

15.3.4. 标准流

15.4. OS

15.4.1. Shell 变量

15.4.2. 管理工具

15.4.3. 运行shell命令

15.4.4. 文件处理

15.4.5. 可移植工具

15.4.6. 路径模块 .path

导航

公告