python模块re

字符：

　　. 匹配除换行符以外的任意字符
　　\w 匹配字母或数字或下划线或汉字
　　\s 匹配任意的空白符
　　\d 匹配数字
　　\b 匹配单词的开始或结束
　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

次数：

　　* 重复零次或更多次
　　+ 重复一次或更多次
　　? 重复零次或一次
　　{n} 重复n次
　　{n,} 重复n次或更多次
　　{n,m} 重复n到m次

1）编译正则表达式

p = re.compile('ab*', re.IGNORECASE)

　　re.compile() 还接受一个可选的参数 flag，用于指定正则匹配的模式

re.IGNORECASE：忽略大小写，同 re.I。
re.MULTILINE：多行模式，改变^和$的行为，同 re.M。
re.DOTALL：点任意匹配模式，让'.'可以匹配包括'\n'在内的任意字符，同 re.S。
re.LOCALE：使预定字符类 \w \W \b \B \s \S 取决于当前区域设定，同 re.L。
re.ASCII：使 \w \W \b \B \s \S 只匹配 ASCII 字符，而不是 Unicode 字符，同 re.A。
re.VERBOSE：详细模式。这个模式下正则表达式可以是多行，忽略空白字符，并可以加入注释。主要是为了让正则表达式更易读，同re.X。

2）match

match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None

 match(pattern, string, flags=0)
 # pattern： 正则模型
 # string ： 要匹配的字符串
 # falgs  ： 匹配模式

　　无分组：

origin='hello你好啊hehe我不好'
patern=re.compile("h\w+")
# r = re.match("h\w+", origin)
r=patern.match(origin)
print(r.group())  # 获取匹配到的所有结果
print(r.groups())  # 获取模型中匹配到的分组结果
print(r.groupdict())  # 获取模型中匹配到的分组结果

1 hello你好啊hehe我不好
2 ()
3 {}

有分组：

1 origin='hello你好啊hehe我不好1'
2 r = re.match("h(\w+).*(?P<name>\d)$", origin)
3 print(r.group())  # 获取匹配到的所有结果
4 print(r.groups())  # 获取模型中匹配到的分组结果
5 print(r.groupdict())  # 获取模型中匹配到的分组中所有执行了key的组

hello你好啊hehe我不好1
('ello你好啊hehe我不好', '1')
{'name': '1'}

3）search

# search,浏览整个字符串去匹配第一个，未匹配成功返回None
# search(pattern, string, flags=0)

 1 # 无分组
 2 
 3 origin='abcde你好啊hehe我不好1'
 4 r = re.search("a\w+", origin)
 5 print(r.group())  # 获取匹配到的所有结果
 6 print(r.groups())  # 获取模型中匹配到的分组结果
 7 print(r.groupdict())  # 获取模型中匹配到的分组结果
 8 
 9 # 有分组
10 
11 r = re.search("a(?P<hanzi>\w+).*(?P<name>\d)$", origin)
12 print(r.group())  # 获取匹配到的所有结果
13 print(r.groups())  # 获取模型中匹配到的分组结果
14 print(r.groupdict())  # 获取模型中匹配到的分组中所有执行了key的组

abcde你好啊hehe我不好1
()
{}
abcde你好啊hehe我不好1
('bcde你好啊hehe我不好', '1')
{'hanzi': 'bcde你好啊hehe我不好', 'name': '1'}

4）findall

# findall，获取非重复的匹配列表；如果有一个组则以列表形式返回，且每一个匹配均是字符串；如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖；
# 空的匹配也会包含在结果中
#findall(pattern, string, flags=0)

1 # 无分组
2 origin='abcde你好啊hehe我不好1'
3 r = re.findall("a\w+", origin)
4 print(r)
5 
6 # 有分组
7 origin = "hello alex bcd abcd lge acd 19"
8 r = re.findall("a((\w*)c)(d)", origin)
9 print(r)

['abcde你好啊hehe我不好1']
[('bc', 'b', 'd'), ('c', '', 'd')]

5）sub

# sub，替换匹配成功的指定位置字符串
 
sub(pattern, repl, string, count=0, flags=0)
# pattern： 正则模型
# repl   ： 要替换的字符串或可执行对象
# string ： 要匹配的字符串
# count  ： 指定匹配个数
# flags  ： 匹配模式

1 origin = "hello alex bcd alex lge alex acd 19"
2 r = re.sub("a\w+", "999", origin, 2)
3 print(r)

hello 999 bcd 999 lge alex acd 19

6）split

# split，根据正则匹配分割字符串
 
split(pattern, string, maxsplit=0, flags=0)
# pattern： 正则模型
# string ： 要匹配的字符串
# maxsplit：指定分割个数
# flags  ： 匹配模式

 1 # 无分组
 2 origin = "hello alex bcd alex lge alex acd 19"
 3 r = re.split("alex", origin, 1)
 4 print(r)
 5 
 6 # 有分组
 7 
 8 origin = "hello alex bcd alex lge alex acd 19"
 9 r1 = re.split("(alex)", origin, 1)
10 print(r1)
11 r2 = re.split("(al(ex))", origin, 1)
12 print(r2)

['hello ', ' bcd alex lge alex acd 19']
['hello ', 'alex', ' bcd alex lge alex acd 19']
['hello ', 'alex', 'ex', ' bcd alex lge alex acd 19']

7）常用正则表达式

IP：
^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手机号：
^1[3|4|5|8][0-9]\d{8}$
邮箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+

posted @ 2017-10-18 14:46 大川哥阅读(284) 评论(0) 收藏举报

刷新页面返回顶部

litzhiai