PythonStudy_正则表达式（三）

　　python中的正则表达式模块是：re，常用函数分别为：re.match（）、re.search（）、re.findall（）、re.finditer（）、re.split（）、re.sub（）、resubn（）等。

　　（一） re.match（）匹配开头

　　此方法只用于匹配字符串开头，匹配成功返回一个match对象，否则返回None。

　　返回match对象后，可使用group（）、groups（）、groupdict（）方法输出匹配的结果。若结果为None，使用这些方法时报错。

　　　　.group（）直接返回匹配的字符串全部；

　　　　.groups（）以tuple类型返回匹配字符串中的分组内容，无分组时返回空组（）。

　　　　.groupdict（）以列表形式返回匹配的内容。格式如：（？P<name>\d)。

import re
s = 'hello world !'
# 匹配开头，返回一个match对象
n1 = re.match('h\S*',s) # 未分组
print(n1)  # <_sre.SRE_Match object; span=(0, 5), match='hello'>
n2 = re.match('(h\S*)',s)   # 分组
print(n2) # <_sre.SRE_Match object; span=(0, 5), match='hello'>

#使用group（）方法，输出匹配的结果
print(n1.group()) # hello
print(n2.group()) # hello
#使用groups()方法，输出匹配的结果
print(n1.groups()) #（）
print(n2.groups()) # ('hello',)
#使用groupdict()方法，输出匹配的结果
n3 = re.match(r'(?P<k1>h\S*)',s) #
print(n3.group()) # hello
print(n3.groups()) # ('hello',)
print(n3.groupdict()) # {'k1': 'hello'}

View Code

　　（二） re.search（）一次匹配

　　此方法在整个字符串中进行匹配。一次匹配成功后结束，返回match对象；若匹配不成功，返回None。

　　返回match对象后，同样使用group（）、groups（）、groupdict（）方法输出匹配的结果。

　　注意：使用.groups（）时，若（）组内还有组（），tuple返回结果中包含各层级（）内的匹配结果。

import re
s = 'hello world !hallo world !'
n1 = re.search('h(.)l',s)
print(n1) # 返回match对象，<_sre.SRE_Match object; span=(0, 3), match='hel'>
print(n1.group()) # 返回匹配的全部， hel
print(n1.groups()) # tuple形式返回匹配的组内内容， ('e',)
n2 = re.search('(?P<k1>h(.)l)',s)
print(n2) # 返回match对象，<_sre.SRE_Match object; span=(0, 3), match='hel'>
print(n2.group()) # 返回匹配的全部， hel
print(n2.groups()) # tuple形式返回匹配的组内内容， ('hel', 'e')
print(n2.groupdict()) # 以字典形式返回匹配的内容， {'k1': 'hel'}

View Code

　　（三） re.findall（）全局匹配，返回list

　　在整个字符串内搜索全部符合规则的子串，并以list形式返回全部匹配结果，匹配失败返回空列表[]。

　　注意1：re.findall（）中使用分组的概念，但是不含有group系列方法。

import re
s = 'hello world !hallo world !'
n = re.findall('h.l',s) # 匹配成功：list形式返回全部匹配结果
print(n) # ['hel', 'hal']
n = re.findall('h(.l)',s) # 有分组概念，但无group系列方法。
print(n) # ['el', 'al']
n = re.findall('w.l',s) # 匹配失败：返回空列表[]
print(n) # []

View Code

　　注意2：默认贪婪匹配，使用时应尽量避免匹配空字符''。匹配空字符''时，结果长度=字符串长度+1。

import re
s = 'hello world !hallo world !'
n = re.findall('',s) # 默认贪婪匹配，尽量避免匹配空字符''
print(n)
#['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

View Code

　　（四） re.finditer（）全局匹配，返回可迭代对象

　　在整个字符串内搜索全部符合规则的子串，返回一个iterator对象。

　　迭代对象中的内容，只有在迭代时才会创建。匹配成功：含有每一个匹配的match对象；匹配失败：None。

　　注意：对iterator对象中的每一个match对象，可以使用.group() 和 .groups（）方法。

import re
s = 'hello world !hallo world !'
n = re.finditer('h(.)l',s) # 返回一个可迭代对象
print(n) # <callable_iterator object at 0x000000000295A748>
for i in n:
    print(i,i.group(),i.groups())
    #<_sre.SRE_Match object; span=(0, 3), match='hel'> hel ('e',)
    #<_sre.SRE_Match object; span=(13, 16), match='hal'> hal ('a',)

n = re.finditer('w(.)l',s) # 返回一个可迭代对象
print(n) # <callable_iterator object at 0x000000000291F6A0>
for i in n:
    print(i,i.group(),i.groups())
    # 匹配失败，返回None

View Code

　　（五） re.split（）字符串切割

　　将字符串按照特定的规则进行匹配，将匹配结果作为分割符，对字符串进行分割。

　　若匹配部分加（），保留匹配项；若不加（），匹配项删除。

import re
s = 'hello world !hallo world!'
# 不使用分组，分割符部分消失
n = re.split(' !',s)
print(n) #['hello world', 'hallo world!']
n = re.split(' ',s)
print(n) #['hello', 'world', '!hallo', 'world!']
# 使用分组，分割符部分保留
n = re.split('( !)',s)
print(n) # ['hello world', ' !', 'hallo world!']
n = re.split(' (!)',s)
print(n) # ['hello world', '!', 'hallo world!']

View Code

　　（六） re.sub（）字符串替换

　　此方法可以实现复杂字符串的替换，返回被替换后的字符串。可以通过参数指定需要替换的次数。

import re
s = 'hello world!123hallo world!456'
n = re.sub('\d+',' ',s) # 将全部数字都替换成空格' '
print(n) # hello world! hallo world!
n = re.sub('\d+',' ',s,1) # 只进行一次替换
print(n) # hello world! hallo world!456

View Code

　　（七） re.subn（）字符串替换，返回替换的次数

　　此方法可以实现复杂字符串的替换，同时返回被替换后的字符串和一共进行替换的次数。

import re
s = 'hello world!123hallo world!456'
n = re.subn('\d+',' ',s) # 将全部数字都替换成空格' '
print(n) # ('hello world! hallo world! ', 2)
n = re.subn('\d+',' ',s,1) # 只进行一次替换
print(n) # ('hello world! hallo world!456', 1)
n = re.subn('\d+',' ',s,4) # 只进行一次替换
print(n) # ('hello world! hallo world!456', 2)

View Code

发表于 2017-04-17 17:13 笨笨是个名字阅读(135) 评论(0) 收藏举报

刷新页面返回顶部

PythonStudy_正则表达式（三）

公告

导航