Python 正则表达式【二】
关于前向,后向,匹配,非匹配
Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.(?!...)Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.(?<=...)Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in abcdef, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not. Note that patterns which start with positive lookbehind assertions will never match at the beginning of the string being searched; you will most likely want to use the search() function rather than the match() function:
>>> import re
>>> m = re.search('(?<=abc)def', 'abcdef')
>>> m.group(0)
'def'
This example looks for a word following a hyphen:
>>> m = re.search('(?<=-)\w+', 'spam-egg')
>>> m.group(0)
'egg'
(?<!...)Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.
前向,后向,匹配,非匹配示例代码:
1 import re 2 def testPrevPostMatch(): 3 # post match: (?=xxx) 4 # post non-match: (?!xxx) 5 # prev match: (?<=xxx) 6 # prev non-match: (?<!xxx) 7 8 #note that input string is: 9 #src=\"http://b101.photo.store.qq.com/psb?/V10ppwxs00XiXU/5dbOIlYaLYVPWOz*1nHYeSFq09Z5rys72RIJszCsWV8!/b/YYUOOzy3HQAAYqsTPjz7HQAA\" 10 qqPicUrlStr = 'src=\\"http://b101.photo.store.qq.com/psb?/V10ppwxs00XiXU/5dbOIlYaLYVPWOz*1nHYeSFq09Z5rys72RIJszCsWV8!/b/YYUOOzy3HQAAYqsTPjz7HQAA\\"' 11 qqPicUrlInvalidPrevStr = '1234567http://b101.photo.store.qq.com/psb?/V10ppwxs00XiXU/5dbOIlYaLYVPWOz*1nHYeSFq09Z5rys72RIJszCsWV8!/b/YYUOOzy3HQAAYqsTPjz7HQAA\\"' 12 qqPicUrlInvalidPostStr = 'src=\\"http://b101.photo.store.qq.com/psb?/V10ppwxs00XiXU/5dbOIlYaLYVPWOz*1nHYeSFq09Z5rys72RIJszCsWV8!/b/YYUOOzy3HQAAYqsTPjz7HQAA123' 13 canFindPrevPostP = r'(?<=src=\\")(?P<qqPicUrl>http://.+?\.qq\.com.+?)(?=\\")' 14 qqPicUrl = "" 15 16 foundPrevPost = re.search(canFindPrevPostP, qqPicUrlStr) 17 print "foundPrevPost=",foundPrevPost 18 if(foundPrevPost): 19 qqPicUrl = foundPrevPost.group("qqPicUrl") 20 print "qqPicUrl=",qqPicUrl; # qqPicUrl= http://b101.photo.store.qq.com/psb?/V10ppwxs00XiXU/5dbOIlYaLYVPWOz*1nHYeSFq09Z5rys72RIJszCsWV8!/b/YYUOOzy3HQAAYqsTPjz7HQAA 21 print "can found qqPicUrl here" 22 23 foundInvalidPrev = re.search(canFindPrevPostP, qqPicUrlInvalidPrevStr) 24 print "foundInvalidPrev=",foundInvalidPrev; # foundInvalidPrev= None 25 if(not foundInvalidPrev): 26 print "can NOT found qqPicUrl here" 27 28 foundInvalidPost = re.search(canFindPrevPostP, qqPicUrlInvalidPostStr) 29 print "foundInvalidPost=",foundInvalidPost; # foundInvalidPost= None 30 if(not foundInvalidPost): 31 print "can NOT found qqPicUrl here" 32 33 return
Python中正则表达式关于引用named group的用法示例
1 import re 2 def testBackReference(): 3 # back reference (?P=name) test 4 backrefValidStr = '"group":0,"iconType":"NonEmptyDocumentFolder","id":"9A8B8BF501A38A36!601","itemType":32,"name":"released","ownerCid":"9A8B8BF501A38A36"' 5 backrefInvalidStr = '"group":0,"iconType":"NonEmptyDocumentFolder","id":"9A8B8BF501A38A36!601","itemType":32,"name":"released","ownerCid":"987654321ABCDEFG"' 6 backrefP = r'"group":\d+,"iconType":"\w+","id":"(?P<userId>\w+)!\d+","itemType":\d+,"name":".+?","ownerCid":"(?P=userId)"' 7 userId = "" 8 9 foundBackref = re.search(backrefP, backrefValidStr) 10 print "foundBackref=",foundBackref; # foundBackref= <_sre.SRE_Match object at 0x02B96660> 11 if(foundBackref): 12 userId = foundBackref.group("userId") 13 print "userId=",userId; # userId= 9A8B8BF501A38A36 14 print "can found userId here" 15 16 foundBackref = re.search(backrefP, backrefInvalidStr) 17 print "foundBackref=",foundBackref; # foundBackref= None 18 if(not foundBackref): 19 print "can NOT found userId here" 20 21 return
浙公网安备 33010602011771号