Python 中正则表达式相关用法

正则模块re

安装

pip install re

使用

re.match()

按照规则进行匹配即可,规则如下:

  • re.match('pattern',string,flags=0)

  • re.match('pattern',string,flags=0) 其中pattern为要匹配的正则, string为要匹配的字符串,flags为标志位,返回一个匹配的对象.

  • 可以使用group()拿到匹配之后的结果.

    In [1]: import re
    
    In [2]: re.match('www','www.baidu.com')
    Out[2]: <re.Match object; span=(0, 3), match='www'>
    
    In [3]: s = re.match('www','www.baidu.com')
    
    In [4]: print(s)
    <re.Match object; span=(0, 3), match='www'>
    
    In [5]: print(s.group())
    www
    
  • re.match() 是按照从头开始进行匹配,如果首位匹配失败,则返回None

      In [3]: s = re.match('com','www.google.com')
      
      In [4]: print(s)
      None
    

re.search()

re.match()不同的是,re.search() 是扫描整个字符串并返回第一个匹配.

匹配规则如下:

  • re.search('pattern',string,flags=0)

  • pattern为要匹配的正则, string为要匹配的字符串,flags为标志位,返回一个匹配的对象.

  • 可以使用group()拿到匹配之后的结果.

    In [5]: import re
    
    In [6]: l = re.search('com','www.google.com')
    
    In [7]: l
    Out[7]: <re.Match object; span=(11, 14), match='com'>
    
    In [8]: l.group()
    Out[8]: 'com'
    

re正则表达式规则

字符

.

  • 可匹配除开换行符\n之外的所有字符

    In [13]: s = re.match('.','abc')
    
    In [14]: s
    Out[14]: <re.Match object; span=(0, 1), match='a'>
    
    
    In [16]: s = re.match('.','\sc')
    In [17]: s.group()
    Out[17]: '\\'
        
        
    In [27]: s = re.match('a.c','abc')
    In [28]: print(s)
    <re.Match object; span=(0, 3), match='abc'>
        
        
    
  • 不能匹配换行符

    In [18]: s = re.match('.','\n')
    
    In [19]: s
    
    In [20]: s.group()
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-20-e3a70314fc21> in <module>
    ----> 1 s.group()
    
    AttributeError: 'NoneType' object has no attribute 'group'
    
    In [21]: print(s)
    None
    
  • DOTALL模式中也能匹配换行符\n

    In [24]: s = re.match('.','\n',flags=re.DOTALL)
    
    In [25]: print(s)
    <re.Match object; span=(0, 1), match='\n'>
    
    In [26]: s.group()
    Out[26]: '\n'
    

\

  • \是转义字符,能是一个字符改变为原来的意思

    In [2]: import re
    
    In [3]: s = re.match(r'a\\c','a\c')
    
    In [4]: s
    Out[4]: <re.Match object; span=(0, 3), match='a\\c'>
    
    In [5]: s.group()
    Out[5]: 'a\\c'
    
    In [6]: s = re.match(r'a\*c','a*c')
    
    In [7]: s
    Out[7]: <re.Match object; span=(0, 3), match='a*c'>
    
    In [8]: s.group()
    Out[8]: 'a*c'
    

[...]

  • 字符集,对应的位置可以匹配字符集中的任意字符

    In [9]: s = re.match(r'a[bcd]e','abe')
    
    In [10]: s
    Out[10]: <re.Match object; span=(0, 3), match='abe'>
    
    In [11]: s.group()
    Out[11]: 'abe'
    
    In [12]: s = re.match(r'a[bcd]e','ade')
    
    In [13]: s.group()
    Out[13]: 'ade'
    

预定义字符集

\d

  • 用于匹配[0-9]的数字

    In [15]: import re
    
    In [16]: re.match(r'a\dc','a1c')
    Out[16]: <re.Match object; span=(0, 3), match='a1c'>
    
    In [17]: s = re.match(r'a\dc','a1c')
    
    In [18]: s.group()
    Out[18]: 'a1c'
    

\D

  • 用于匹配非数字

    In [19]: s = re.match(r'a\Dc','a1c')
    
    In [20]: s
    
    In [21]: print(s)
    None
    
    In [22]: s = re.match(r'a\Dc','afc')
    
    In [23]: print(s)
    <re.Match object; span=(0, 3), match='afc'>
    

\s

  • 用于匹配空白字符

    In [24]: s = re.match(r'a\sc','a c')
    
    In [25]: print(s)
    <re.Match object; span=(0, 3), match='a c'>
    
    In [26]: s.group()
    Out[26]: 'a c'
    

\S

  • 用于匹配非空白字符

    In [27]: s = re.match(r'a\Sc','asc')
    
    In [28]: s
    Out[28]: <re.Match object; span=(0, 3), match='asc'>
    
    In [29]: s.group()
    Out[29]: 'asc'
    

\w

  • 匹配单词字符[A-Za-z0-9_],[]中均为单词字符

    
    In [30]: s = re.match(r'a\wc','asc')
    
    In [31]: s.group()
    Out[31]: 'asc'
    
    

\W

  • 用于匹配非单词字符,如空格或者\n换行符

    In [32]: s = re.match(r'a\wc','a c')
    
    In [33]: s
    
    In [34]: print(s)
    None
    
    
    In [35]: s = re.match(r'a\wc','a\nc')
    
    In [36]: print(s)
    None
    

数量词

*

  • 用于匹配前一个字符0或者无限次

    In [1]: import re
    
    In [2]: re.match('abc*','ab')
    Out[2]: <re.Match object; span=(0, 2), match='ab'>
    
    In [3]: s  = re.match('abc*','ab')
    
    In [4]: print(s)
    <re.Match object; span=(0, 2), match='ab'>
    
    In [5]: s.group()
    Out[5]: 'ab'
    
    In [6]: s  = re.match('abc*','abcccc')
    
    In [7]: print(s)
    <re.Match object; span=(0, 6), match='abcccc'>
    
    In [8]: s.group()
    Out[8]: 'abcccc'
    

+

  • 用于匹配前一个字符1 次或者无限次

    In [9]: s  = re.match('abc+','ab')
    
    In [10]: s.group()
       ---------------------------------------------------------------------------
     AttributeError Traceback (most recent call last)
            <ipython-input-10-e3a70314fc21> in <module>
    
    In [12]: print(s)
    None
    
    
    In [13]: s  = re.match('abc+','abc')
    
    In [14]: s.group()
    Out[14]: 'abc'
      
    

?

  • 用于匹配前一个字符0次或者一次

    In [27]:  s  = re.match('abc?','abc')
    
    In [28]: print(s)
    <re.Match object; span=(0, 3), match='abc'>
    
    In [29]:  s  = re.match('abc?','abcc')
    
    In [30]: print(s)
    <re.Match object; span=(0, 3), match='abc'>
    
    In [31]:  s  = re.match('abc?','ab')
    
    In [32]: print(s)
    <re.Match object; span=(0, 2), match='ab'>
    

{m}

  • 用于匹配前一个字符m次

    In [34]:  s  = re.match('ab{2}c','abbc')
    
    In [35]: s.group()
    Out[35]: 'abbc'
    
    
    In [36]:  s  = re.match('ab{2}c','abc')
    In [38]: print(s)
    None
    

组合使用

一般情形下, Python中的re正则表达式都是组合进行使用字符,预定义字符集数量词.构建正则匹配的规则.

示例: 匹配邮箱正则表达式

In [39]: email = "king-nova@gmail.com"

In [40]: s = re.match(r'^[\.A-Za-z0-9_-]+@[A-Za-z0-9_-]+(\.[A-Za-z0-9_-]+)+$', email)

In [41]: s
Out[41]: <re.Match object; span=(0, 19), match='king-nova@gmail.com'>

In [42]: s.group()
Out[42]: 'king-nova@gmail.com'
posted @ 2020-09-22 14:55  郁文  阅读(123)  评论(0)    收藏  举报