Python 中正则表达式相关用法
正则模块re
安装
pip install re
使用
re.match()
按照规则进行匹配即可,规则如下:
-
re.match('pattern',string,flags=0) -
re.match('pattern',string,flags=0)其中pattern为要匹配的正则,string为要匹配的字符串,flags为标志位,返回一个匹配的对象. -
可以使用
group()拿到匹配之后的结果.In [1]: import re In [2]: re.match('www','www.baidu.com') Out[2]: <re.Match object; span=(0, 3), match='www'> In [3]: s = re.match('www','www.baidu.com') In [4]: print(s) <re.Match object; span=(0, 3), match='www'> In [5]: print(s.group()) www -
re.match()是按照从头开始进行匹配,如果首位匹配失败,则返回NoneIn [3]: s = re.match('com','www.google.com') In [4]: print(s) None
re.search()
和 re.match()不同的是,re.search() 是扫描整个字符串并返回第一个匹配.
匹配规则如下:
-
re.search('pattern',string,flags=0) -
pattern为要匹配的正则,string为要匹配的字符串,flags为标志位,返回一个匹配的对象. -
可以使用
group()拿到匹配之后的结果.In [5]: import re In [6]: l = re.search('com','www.google.com') In [7]: l Out[7]: <re.Match object; span=(11, 14), match='com'> In [8]: l.group() Out[8]: 'com'
re正则表达式规则
字符
.
-
可匹配除开换行符
\n之外的所有字符In [13]: s = re.match('.','abc') In [14]: s Out[14]: <re.Match object; span=(0, 1), match='a'> In [16]: s = re.match('.','\sc') In [17]: s.group() Out[17]: '\\' In [27]: s = re.match('a.c','abc') In [28]: print(s) <re.Match object; span=(0, 3), match='abc'> -
不能匹配换行符
In [18]: s = re.match('.','\n') In [19]: s In [20]: s.group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-20-e3a70314fc21> in <module> ----> 1 s.group() AttributeError: 'NoneType' object has no attribute 'group' In [21]: print(s) None -
在
DOTALL模式中也能匹配换行符\nIn [24]: s = re.match('.','\n',flags=re.DOTALL) In [25]: print(s) <re.Match object; span=(0, 1), match='\n'> In [26]: s.group() Out[26]: '\n'
\
-
\是转义字符,能是一个字符改变为原来的意思In [2]: import re In [3]: s = re.match(r'a\\c','a\c') In [4]: s Out[4]: <re.Match object; span=(0, 3), match='a\\c'> In [5]: s.group() Out[5]: 'a\\c' In [6]: s = re.match(r'a\*c','a*c') In [7]: s Out[7]: <re.Match object; span=(0, 3), match='a*c'> In [8]: s.group() Out[8]: 'a*c'
[...]
-
字符集,对应的位置可以匹配字符集中的任意字符
In [9]: s = re.match(r'a[bcd]e','abe') In [10]: s Out[10]: <re.Match object; span=(0, 3), match='abe'> In [11]: s.group() Out[11]: 'abe' In [12]: s = re.match(r'a[bcd]e','ade') In [13]: s.group() Out[13]: 'ade'
预定义字符集
\d
-
用于匹配
[0-9]的数字In [15]: import re In [16]: re.match(r'a\dc','a1c') Out[16]: <re.Match object; span=(0, 3), match='a1c'> In [17]: s = re.match(r'a\dc','a1c') In [18]: s.group() Out[18]: 'a1c'
\D
-
用于匹配非数字
In [19]: s = re.match(r'a\Dc','a1c') In [20]: s In [21]: print(s) None In [22]: s = re.match(r'a\Dc','afc') In [23]: print(s) <re.Match object; span=(0, 3), match='afc'>
\s
-
用于匹配空白字符
In [24]: s = re.match(r'a\sc','a c') In [25]: print(s) <re.Match object; span=(0, 3), match='a c'> In [26]: s.group() Out[26]: 'a c'
\S
-
用于匹配非空白字符
In [27]: s = re.match(r'a\Sc','asc') In [28]: s Out[28]: <re.Match object; span=(0, 3), match='asc'> In [29]: s.group() Out[29]: 'asc'
\w
-
匹配单词字符
[A-Za-z0-9_],[]中均为单词字符In [30]: s = re.match(r'a\wc','asc') In [31]: s.group() Out[31]: 'asc'
\W
-
用于匹配非单词字符,如空格或者
\n换行符In [32]: s = re.match(r'a\wc','a c') In [33]: s In [34]: print(s) None In [35]: s = re.match(r'a\wc','a\nc') In [36]: print(s) None
数量词
*
-
用于匹配前一个字符0或者无限次
In [1]: import re In [2]: re.match('abc*','ab') Out[2]: <re.Match object; span=(0, 2), match='ab'> In [3]: s = re.match('abc*','ab') In [4]: print(s) <re.Match object; span=(0, 2), match='ab'> In [5]: s.group() Out[5]: 'ab' In [6]: s = re.match('abc*','abcccc') In [7]: print(s) <re.Match object; span=(0, 6), match='abcccc'> In [8]: s.group() Out[8]: 'abcccc'
+
-
用于匹配前一个字符1 次或者无限次
In [9]: s = re.match('abc+','ab') In [10]: s.group() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-10-e3a70314fc21> in <module> In [12]: print(s) None In [13]: s = re.match('abc+','abc') In [14]: s.group() Out[14]: 'abc'
?
-
用于匹配前一个字符0次或者一次
In [27]: s = re.match('abc?','abc') In [28]: print(s) <re.Match object; span=(0, 3), match='abc'> In [29]: s = re.match('abc?','abcc') In [30]: print(s) <re.Match object; span=(0, 3), match='abc'> In [31]: s = re.match('abc?','ab') In [32]: print(s) <re.Match object; span=(0, 2), match='ab'>
{m}
-
用于匹配前一个字符m次
In [34]: s = re.match('ab{2}c','abbc') In [35]: s.group() Out[35]: 'abbc' In [36]: s = re.match('ab{2}c','abc') In [38]: print(s) None
组合使用
一般情形下, Python中的re正则表达式都是组合进行使用字符,预定义字符集和数量词.构建正则匹配的规则.
示例: 匹配邮箱正则表达式
In [39]: email = "king-nova@gmail.com"
In [40]: s = re.match(r'^[\.A-Za-z0-9_-]+@[A-Za-z0-9_-]+(\.[A-Za-z0-9_-]+)+$', email)
In [41]: s
Out[41]: <re.Match object; span=(0, 19), match='king-nova@gmail.com'>
In [42]: s.group()
Out[42]: 'king-nova@gmail.com'

浙公网安备 33010602011771号