Python 正则表达式

1. 常用的元字符

　　1. .　　匹配任意字符，除换行符\n外

　　　　 re.search(r'[a-z].*', 'python\n123@11.com')　　python　　[a-z]表示小写字母，.表示匹配除换行符外任意字符，*表示0次或多次

　　2. *　　匹配零次或多次符合项，贪婪模式

　　　　re.search(r'@[0-9]*', 'pyth@on.com')　　　　@　　　　*表示匹配0次或多次，@后面虽然没有出现数字，但是匹配上了

　　3. +　　匹配1次或多次符合项

　　　　re.search(r'@[0-9]+', 'pyth@on.com')　　　　None　　+表示1次或多次，@后面没有出现数字，所以结果匹配不上

　　4. ？　匹配0个或1个符合项，非贪婪模式

　　　　re.findall(r'\d+[a-z]?', '12php123@11.com')　　['12p', '123', '11']　　\d+为1个或多个数字后面存在0个或者1个小写字母

　　5. ^　　以什么开头，配合re.M多行模式

　　　　re.search('^[0-9]+', 'php css\n11. com', re.M)　　qq　　　　　多行模式下

　　　　re.search('^[q]+', 'php css\n11. com')　　　　 None　　　　没有多行模式，匹配不到结果　

　　　　re.search('^[q]+', 'php 11qq. com')　　　　　　 None　　　　qq不在行的开头，匹配不到

　　6. $　　以什么结尾，和^作用相反

　　　　re.search('[0-9]+$', 'php 11\nqq. com', re.M)　　 11　　　　多行模式，否则无效

　　　　re.search('[0-9]+$', 'php 11\nqq. com')　　　　 None　　　非多行模式，匹配不到

　　　　re.search('[0-9]+$', 'php 11qq. com')　　　　　None　　　不在行尾，也不行　

　　7. |　　或表达式，匹配其中任意一项即可

　　　　re.search(r'^[a-z]+|^[0-9]+', '12php123@11.com')　　12　　匹配以字母开头或者是数字开头

　　8. ( )　对正则表达式分组并记住匹配的文本，group

　　　　re.search(r'([0-9]*)@([0-9]*)', 'php123@11.com').group(0)　　123@11　　这里以@为中点分成两组，参数0为匹配所有分组内容

　　　　re.search(r'([0-9]*)@([0-9]*)', 'php123@11.com').group(1)　　123　　　　参数1则是匹配第一个分组的内容

　　　　re.search(r'([0-9]*)@([0-9]*)', 'php123@11.com').group(0)　　11　　　　参数2则是匹配第二个分组的内容

　　9. { }　　匹配指定范围数量的数据，一个参数时是指定个数，两个参数时是指定范围

　　　　re.findall(r'[qq]{2}', 'pythonq@qq.comqqq')　　　　['qq', 'qq']　　　　　　一个参数时是指定的字符数量

　　　　re.findall(r'q{2,4}', 'pythonqqq@qq.comqqqq')　　 ['qqq', 'qq','qqqq']　　两个参数是一个范围值，即2到4个q

　　10. [ ]　　匹配当中的所有符合项

　　　　re.findall(r'[0-9,@]', 'python23@4.com')　　['2', '3', '@', '4']　　匹配数字和@符号

　　11. [ ^ ]　匹配当中以外的内容

　　　　re.findall(r'[^a-z,@]', 'python23@4.com')　　['2', '3', '4', '.']　　匹配除了小写字母和@外的其他字符　　　

　　12. \　　反转义字符，使\失去转义意义

　　　　re.findall(r'\.\w', 'python23@4.com')　　　　['.c']　　.表示任意字符，\.只能匹配 .

2. 字母元字符

　　1. \A　　匹配文本的开始位置，忽略多行模式

　　　　re.search(r'\Aqq', 'php css\nqq. com', re.M)　　　　None　　　　\A只能从文本头部开始匹配，不是从行的开头

　　2. \Z　　匹配文本的末尾部分，忽略多行模式

　　　　re.search('com\Z', 'python com\nqq.com',re.M)　　（14,17）　　 \Z匹配文本的末尾部分，不是行的末尾

　　3. \w　　匹配任意字母，数字和下划线

　　　　re.findall(r'\w+', 'Python?_#123@.com')　　['Python', '_', '123', 'com']　　

　　4. \W　　匹配非字母，数字和下划线，和\w相反

　　　　re.findall(r'\W+', 'Python?_#123@.com')　　 ['?', '#', '@.']

　　5. \s　　匹配空白字符，即空格

　　　　re.findall(r'\s+', 'python @ 136.co m')　　　 [' ', ' ', ' ']

　　6. \S　　匹配非空白字符

　　　　re.findall(r'\S+', 'python @ 136.co m')　　　　['python', '@', '136.co', 'm']

　　7. \d　　匹配数字

　　　　re.findall(r'\d+', 'python12 @ 136.co m')　　　['12', '136']

　　8. \D　　匹配非数字

　　　　re.findall(r'\D+', 'python12 @ 136.co m')　　　['python', ' @ ', '.co m']

　　9. \b　　表示单词前后的空字符串，即匹配某个字符是不是一个单词

　　　　re.findall(r'\bthon\b', 'hello world and thon is python')　　['thon']　　只能匹配除完整的单词，不能是单词的一部分

　　10. \B　　表示不位于单词前后的空字符串，即判断某个单词总存在该字符

　　　　re.search(r'\Btho\B', 'hello tho is python')　　　　　　（15,18）　该字符串存在某个单词中，不会匹配整个单词

3. flgs参参数

　　1. re.I　　忽略大小写模式

　　　　re.findall(r'[a-z]+', 'hello THE is Python')　　　　['hello', 'is', 'ython']

　　　　re.findall(r'[a-z]+', 'hello THE is Python',re.I)　　['hello', 'THE', 'is', 'Python']

　　2. re.L　　字符集本地化，例如\w表示英文和数字，但是在其他语言的环境下，不能匹配"é" 或 “ç”，设置re.L就可以匹配

　　3. re.M　　多行模式

　　　　re.findall(r'^\w+', 'hello world \npython is good')　　　　　　 ['hello']

　　　　re.findall(r'^\w+', 'hello world \npython is good', re.M)　　　　['hello', 'python']

　　4. re.S　　此模式下，‘.’ 的匹配不受限制，可匹配任何字符，包括换行符

　　　　re.findall(r'.+', 'hello world \ngood is python')　　　　 ['hello world ', 'good is python']

　　　　re.findall(r'.+', 'hello world \ngood is python', re.S)　　['hello world \ngood is python']

　　5. re.X　　冗余模式，忽略空格和#后面的注释

　　　　re.findall(r'\d+ #这里是匹配数字', 'hello world12 good is python666')　　　　　　 [ ]

　　　　re.findall(r'\d+ #这里是匹配数字', 'hello world12 good is python666', re.X)　　　　['12', '666']

4. 非获取匹配模式

　　1. ?:　　匹配指定内容，不形成组，一般和或运算符搭配使用

　　　　"com(?:puter|pare)"： 可以匹配"computer"和"compare", 但不能匹配这两个单词之外的字符串，如"complete"

　　2. ?=　　正向肯定预查，一般用于后缀肯定判断

　　　　"app(?=le|lication)"： 当待匹配出的内容为"apple"或"application"; 当待匹配内容为"appear", 则无匹配结果

　　3. ?!　　正向否定预查，一般用于后缀否定判断，和?=相反

　　　　app(?!le|lication)"： 不能匹配出为"apple"或"application"的内容, 当待匹配内容为"appear"等其他内容时, 可以匹配出"app"

　　4. ?<=　　反向肯定预查，一般用于前缀的肯定判断

　　　　"(?<=w|t)here"：当待匹配内容为"where"或"there", 可以匹配出"here"; 当待匹配内容为"inhere", 则无匹配结果

　　5. ?>=　　反向否定预查，一般用于前缀的否定判断

　　　　"(?<!w|t)here":：当待匹配内容为"where"或"there", 无匹配结果; 当待匹配内容为"inhere", 可以匹配出"here"

　　例如强密码的使用：密码必须包含英文，数字，下划线以及特殊字符#$，长度8-10位

　　　　^(?=.*\d)(?=.*[#$])(?=.*\w).{8,10}$　　这个表达式有三个分组，他们没有顺序，只要符合即可匹配

5. 正则表达式常见的函数

　　1. re.march(partten, string, flgs)　　从头开始匹配　　　　

re.match 从头开始匹配
str = 'www.run123oob.com'
re.match('ww', 'www.run123oob.com', re.I).span()     // (0,2)  span()表示出现的位置
re.match('run', 'www.run123oob.com')       　　　　　　// None    匹配不到返回None

. 表示匹配出换行外的所有字符，* 重复匹配，？非贪婪模式，只匹配0次或1次
matchObj = re.match(r'(.*) are (.*)', 'cat are Cats are smarter than dogs')
if matchObj:
    macthObj.group()     // cat are Cats are smarter than dogs
    matchObj.group(1)    // cat are Cats
    matchObj.group(2)    // smarter than dogs

最后面的(.*).* 不包含匹配结果，可有可无
matchObj1 = re.macth(r'(.*) are (.*).*', 'cat are Cats are smarter than dogs')
if matchObj1:
    macthObj1.group()     // cat are Cats are smarter than dogs
    matchObj1.group(1)    // cat are Cats
    matchObj1.group(2)    // smarter than dogs

(.*) .* 用空格隔开，代表一个占位,会占用后面单词
macthObj2 = re.macth(r'(.*) are (.*) .* .*', 'cat are Cats are smarter than dogs')
if macthObj2:
    matchObj2.group()     // cat are Cats are smarter than dogs
    matchObj2.group(1)    // cat are Cats
    matchObj2.group(2)    // smarter

? 的作用是非贪婪模式，即最多匹配1次，遇到的第一个are开始
macthObj3 = re.macth(r'(.*?) are (.*) .* .*', 'cat are Cats are smarter than dogs')
if macthObj3:
    matchObj3.group()     // cat are Cats are smarter than dogs
    matchObj3.group(1)    // cat
    matchObj3.group(2)    // Cats are smarter

? 在不同位置对比，只匹配第一个符合要求的
matchObj4 = re.match(r'(.*) are (.*?)', 'cat are Cats are smarter than dogs')
if matchObj4:
    matchObj4.group()     // cat are Cats are
    matchObj4.group(1)    // cat are Cats
    matchObj4.group(2))   // smarter

　　2. re.search(partten, string, flgs)　　查找整个字符串，并返回第一次匹配字符的位置　

 返回值的方法
    1. span()        返回匹配内容的下标位置
    2. group()    返回匹配的字符串内容
    3. start()        返回匹配内容的起始位置
    4. end()        返回匹配内容的结束位置

re.search('www','wWw.run123oob.com', re.I).span()     // 可以从头开始匹配 (0,3)
re.search('un1','wWw.run123oob.com', re.I).span()     // 可以从中间开始匹配 (5,8)
re.search('un1','wWw.run123oob.com', re.I).group()    // 返回匹配的字符串内容 un1
re.search('un1','wWw.run123oob.com', re.I).start()    // 返回匹配值的起始位置  5
re.search('un1','wWw.run123oob.com', re.I).end()      // 返回匹配值的结束位置  8

　　3. re.findall(partten, string, flgs，[start, end] )　　可以指定范围，以列表的形式，返回符合条件的匹配项，如果多个匹配模式，则返回元组列表

单个匹配模，列表返回
patterns = re.compile(r'\d+', re.I)
result1 = patterns.findall('runoob 123 google 456')     　　// ['123', '456']
result2 = patterns.findall('run383fshf84hd7sfh89', 4, 15)  // ['83', '84', '7']

多个匹配模式，元组列表返回
re.findall(r'(\w+)=(\d+)', 'set width=20 and height=10')  // [('width', '20'), ('height', '10')]

从头部开始检索
re.compile(r'\W+').findall('one@234.two 234f**lf22')  　　// ['@', '.', ' ', '**']

　　4. re.sub( pattern，repl，string，count，flags）　　用户替换字符串中的匹配项

　　　　pattern 匹配的正则表达式　　repl 替换的字符串　　string 需要匹配的字符串　　count 替换的最大次数，默认0为替换所有匹配项

str = '188-3322-7899'   # 这是一个电话号码
删除注释
re.sub(r'#.*S', '', str)  // 将#后面的非空格字符串替换成空字符

删除非数字字符
re.sub(r'\D', '', str)    // \D 非数字字符 18833227899

替换字符串
re.sub(r'2', '0', str)    // 188-3300-7899

当repl为一个函数时
def double(mat):
    value = int(mat.group('value'))
    return str(value * 2)
str = 'A23Df22ef33DD32'
# P可以理解为将字符串s分组处理并命名为value，\d表示数字，+表示不止一个数字
# 于是每处理一次变把一组数字从字符串中取出冠以value的名字，再传入double函数处理。
re.sub(r'(?P<value>\d+)', double, str)    // A46Df44ef66DD64

　　5. re.compile( pattern，[flags] 用于编译正则表达式，删除一个正则表达式对象，供其他函数使用

patterns = re.compile(r'\d+')    // 匹配数字
从头部开始检索
patterns.match('bin123gol333del')   // None

从指定位置开始检索
pat = patterns.match('bin123gol333del', 3, 10)   // 从指定位置开始
pat.span()     // (3,6)
pat.group()    //  123

group的应用
pattern1 = re.compile(r'([a-z]+) ([a-z].*) ([a-z])', re.I)  // + 表示连续多个字符，没有单标单个字符
m2 = pattern1.match('Hello World Wide Deb')
m2.group()     // Hello World Wide D 两个匹配项分别匹配两个相符项，macth只匹配第一个字符
m2.group(1)    // Hello 调用正则组里的第一个匹配条件
m2.group(2)    // World Wide 调用正则组里的第二个匹配条件，字符串被group以空格为标志分为了三组
m2.group(3)    // D 根据要求，只能分配到了最后一组，因为没有+，只匹配单个值

　　6. re.finditer(pattern，string，flags=0）　　以迭代器形式返回所有匹配的子串

it = re.finditer(r'\d+', '12ads45pdf88a90go19da')
print(it)    　　// <callable_iterator object at 0x0000029A2D14E340>

迭代器都可以通过for循环获取值
for i in it:
    print(i)    // 12, 45, 88, 90, 19

　　7. re.spilt(pattern，string，[maxspilt=0，flags=0]) 　　将匹配的字符串分割后返回列表，maxspilt 分隔次数，默认0，不限次数分隔

\W+ 匹配非字母数字下划线的字符,如果最后一个也是匹配项，则返回一个空
re.spilt(r'\W+', '12web@qq.com.')   // ['12web', 'qq', 'com', '']

如果正则表达式用括号括起来，则会把符合的匹配项也算进来
re.spilt(r'(\W+)','12web@qq.com.')  // ['12web', '@', 'qq', '.', 'com', '.', '']

指定截取次数
re.spilt(r'\W+','12web@qq.com.', 2)  // ['12web', 'qq', 'com.']

posted @ 2023-02-25 23:20 无敌小豆包阅读(307) 评论(0) 收藏举报

刷新页面返回顶部

Python 正则表达式

公告