Python 6th Day
正则表达式
元字符 (metacharacters)
. ^ $ * + ? { } [ ] \ | ( )
[], 用来指定一个字符集(character class),字符集可以单个列出或者指定一个范围,For example,
[abc] will match any of the characters a, b, or c; this is the same as [a-c], which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your RE would be [a-z]
在 [] 中,元字符不起特殊作用,For example, [akm$] will match any of the characters 'a', 'k', 'm', or '$'; '$' is usually a metacharacter, but inside a character class it’s stripped of its special nature
在 [] 中使用 '^' 可以表示取非,For example, [^5] will match any character except '5'
使用 \ (backslash) 转义,if you need to match a [ or \, you can precede them with a backslash to remove their special meaning: \[ or \\
predefined sets of characters:
\d == [0-9] \D == [^0-9] \s == [ \t\n\r\f\v] # 所有的空格字符 \S == [^ \t\n\r\f\v] # 所有的非空字符 \w == [a-zA-Z0-9_] \W == [^a-zA-Z0-9_] 字符集可以嵌套使用,For example, [\s,.] is a character class that will match any whitespace character, or ',' or '.'
'.' matches anything except a newline character
'*' it specifies that the previous character can be matched zero or more times
'+' which matches one or more times
'?' matches either once or zero times, For example, home-?brew matches either homebrew or home-brew
Compiling Regular Expressions
正则表达式被编译成模式对象(pattern objects),模式对象可以用很多种方法进行匹配或者操作。
>>> import re
>>> p = re.compile('ab*')
>>> p
<_sre.SRE_Pattern object at 0x...>
使用原生字符串(raw string notation: r)
| Regular String | Raw string |
| "ab*" | r"ab*" |
| "\\\\section" | r"\\section" |
| "\\w+\\s+\\1" | r"\w+\s+\1" |
字符串匹配
match() 从头匹配,成功返回 match object, 否则返回 None
search() 匹配整个字符串,成功返回 match object,没有匹配返回 None
findall() 查找整个字符串并返回列表
finditer() 查找整个字符串并以 match object 形式返回迭代器
match object 对象实例包括以下重要方法:
group() 以字符串格式返回匹配部分(substring)
start() 返回匹配的起始位置
end() 返回匹配的结束位置
span() 返回包括起始结束位置的元祖
分组
使用 () 分组,组序号从 0 开始,group 0 就是表达式本身,所以 match object 的方法中都包含 group 0 作为默认参数。
>>> p = re.compile('(a)b') >>> m = p.match('ab') >>> m.group() 'ab' >>> m.group(0) 'ab'
分组可以嵌套
>>> p = re.compile('(a(b)c)d') >>> m = p.match('abcd') >>> m.group(0) 'abcd' >>> m.group(1) 'abc' >>> m.group(2) 'b'
group() 可以一次访问多个组成员,返回元祖
>>> m.group(2,1,2) ('b', 'abc', 'b')
groups() 以元祖返回所有 subgroups
>>> m.groups() ('abc', 'b')
修改字符串
分割字符串
>>> p = re.compile(r'\W+') >>> p.split('This is a test, short and sweet, of split().') ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', ''] >>> p.split('This is a test, short and sweet, of split().', 3) ['This', 'is', 'a', 'test, short and sweet, of split().']

浙公网安备 33010602011771号