python正则表达式<二>

Posted on 2014-11-21 06:45 星际海盗阅读(398) 评论(0) 收藏举报

正则表达式的使用

compile 方法

编译正则表达式：

　　如果定义的一个正则表达式使用比较频繁的话，可以把该正则表达式编译一下，让它成为一个对象，然后使用，这样可以提高查找速度。

编译使用的是re中的compile()方法。　　help(re.compile) compile(pattern, flags=0) re模块中更多的方法可以查看手册，help(re) 或 dir(re)

例：

import re
>>> r2 = r'\d{３,４}-?\d{8}'
>>> re.findall(r2,'1234-1345321')
　[　]
>>> re.findall(r2,'1234-13453212')
['1234-13453212']
>>> re.findall(r2,'123-13453212')
['123-13453212']
>>> re.findall(r2,'123-13456456')
['123-13456456']
>>> by = re.compile(r2)　＃使用compile方法把正则r2编译成一个对象
>>> by
<_sre.SRE_Pattern object at 0xb758b180>
>>> by.findall('2131-12345678')　　＃直接使用ｂｙ做匹配
['2131-12345678']……………………………………………………………………………………………………

db= re.compile(　r'abcd',re.I　)　＃I　是compile中的一个属性，表示不区分大小写
>>> db.findall('Abcd')
['Abcd']
>>> db.findall('aBCd')
['aBCd']
>>> db.findall('aBcd')
['aBcd']
>>> db.findall('aBcD')
['aBcD']
>>>

＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃

match 方法：　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　在match 中有四个属性

>>> db= re.compile('abcd',re.I) 编译一个正则　　　　　　　　　　　　　　　　　　　　　　　　　　　　1.group()返回re查找到的字符串

>>> db.match('abc')　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 2.start() 返回匹配开始的位置
>>> db.match('abcd')　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 3.end()　返回匹配结束的位置
<_sre.SRE_Match object at 0xb7406de8>　表示是一个match对象　　　　　　　　　　　　　　　　　　　 4.span()　以元组的形式返回开始和结束的位置

>>> a=db.match('abcd')
>>> a
<_sre.SRE_Match object at 0xb7413790>
>>> a.group() 可以查看到匹配到的值
'abcd'

>>> db.match('ad abcd')　　如果需要匹配的字符串在开头，才会匹配成功，否则返回一个空值

search方法　　在所有字符串中做查找

>>>db.search('ad abcd')　　search() 扫描字符串，不管字符串在哪个位置都可以匹配
<_sre.SRE_Match object at 0xb74fd100>

findall 方法　将所有能匹配的值以列表的形式全部返回

>>> db.findall('adc ab abcd')
['abcd']
>>> db.findall('adc ab abcd 134 abcd 34234 abcd')
['abcd', 'abcd', 'abcd']

finditer 方法　　返回一个迭代的对象

>>> db.finditer('adc ab abcd 134 abcd 34234 abcd')
<callable-iterator object at 0xb73ea6ec>
>>> a = db.finditer('adc ab abcd 134 abcd 34234 abcd')
>>> a
<callable-iterator object at 0xb74f16cc>

>>> a.next()
<_sre.SRE_Match object at 0xb7406de8> 这里又成ｍａｔｃｈ对象
>>> a
<callable-iterator object at 0xb74f16cc>

######################################

sub 方法　　

用于将按正则替换字符串中的内容

sub(pattern, repl, string, count=0, flags=0)

例子：

>>> g='xiong chu mo'
>>> g.replace('xiong','ren')使用字符串处理方法replace将xiong替换成ren
'ren chu mo'
>>> gg = r'x...g' 定义一个正则
>>> g
'xiong chu mo'
>>> g.replace(gg,'sha')
'xiong chu mo'　　显示无法替换
>>> re.sub(gg, 'fei', 'xabcg xaaag xcccg xxxxg　aaa')
'fei fei fei fei aaa'　　　将字符串xabcg xaaag xcccg xxxxg　中的所有符合gg规则的字符串全部替换

split()方法：

作用：将按自己的需求切割字符串,默认以空格分割

注：切割的标记字符串必须是该字符串中的字符。

sub(pattern, repl, string, count=0, flags=0)

>>> ip = '1.1.1.1'　　　　注：ip此时是个字符串
>>> ip.split('x')
['1.1.1.1']　　　　　　　 切割字符串后的结果是个列表
>>> ip.split('.') 　　　　 对该列表按'.'进行切割
['1', '1', '1', '1']　　　　　切割后的结果。

>>> s = 'a href="/offensive-security-solutions/virtual-penetration-testing-labs/'

>>> re.split(r'/',s)
['a href="', 'offensive-security-solutions', 'virtual-penetration-testing-labs', ''] 结果显示字符串按‘/’字符进行了切割。

>>> re.split(r'[/\-]',s)　　这里不可以直接用-，它本身表示的是范围　应加个转义字符 \
['a href="', 'offensive', 'security', 'solutions', 'virtual', 'penetration', 'testing', 'labs', '']
>>>

针对网络数据分割还是可以的

############################ 分组 ###############################

使用（）做分组，

>>> email = r'\w{3}@\w+(\.com|\.cn)' 使用（）将.com和.cn分组，在分组中可以做与或非等。。。
>>> re.match(email,'bai@bai.com')
<_sre.SRE_Match object at 0xb74a28a0>

>>> re.findall(email,'bai@bai.com')
['.com']　　　　　　注：findall返回时优先返回分组中匹配的数据
>>>

>>> s = '''
... jinttina fkjdf 123+x n jfdkj
... 123+5 n fjkdjgfkjdgjdfgj;da fjd fjdskf fksd fjdsk gksd
... jdkjfkjdkf f djfk dsafd saf d fd sf 123+1 n jfkdfj
... '''
>>> rx = r'123.. n'
>>> re.findall(rx,s)
['123+x n', '123+5 n', '123+1 n']
>>> rx = r'123(..) n' 做分组，
>>> re.findall(rx,s)
['+x', '+5', '+1'] 只会显示分组的数据
>>>

刷新页面返回顶部

hello cc

星际海盗

python正则表达式<二>

hello man