正则表达式 - 喵吉欧尼酱

正则表达式

二元字符 . 、^ 、$、 + 、 ?、 {}、 [] 、 |、 () 、 \

========================================================

　　. 匹配除换行符以外的任意字符，只能匹配一次 >>> re.findall('bot.x','sdqwoiujbotwx') -------->> ['botwx']

　　^ 匹配字符串的开始 ,在中间匹配不出来 >>>re.findall('^bot.x','botgxsdqwoiujdwx') -----》》['botgx']

　　\w 匹配字母或数字或下划线或汉字,相当于类[Aa-z-Z0-9]

　　\s 匹配任意的空白符,他相当于类的[\t\n\r\f\v]

　　\s 匹配任意的空白符,他相当于类的[^\t\n\r\f\v]
　　\d 匹配任何十进制数字，相当于类的[0-9]

　　\D匹配任何非10进制的数字相当于[^0-9]
　　\b 匹配单词的开始或结束
　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

\：

　反斜杠后面跟元字符实现去除特殊功能

　反斜杠后面跟普通字符实现特殊功能

引用序号对应的字组所匹配的字符串、

次数重复：

　　* 重复零次或更多次，前面可以是没有字符，匹配模式为贪婪模式，有几个相同字符都会被匹配
　　+ 重复一次或更多次，后面没有字符匹配不出来
　　? 重复零次或一次 >>> re.findall('^bot?','botgxsdqwoiujdwx') ---》 bot，可以匹配bot后面没有的字符，后面重复的字符也不匹配
　　{n} 重复n次
　　{n,} 重复n次或更多次
　　{n,m} 重复n到m次

---=======================================

特殊字符

[^0-9] >>> re.findall('[^0-9]','botg123ggxsdqw9oi7uj1d0wx')除了0-9的数字其余的都可以显示出来

常用特殊字符应用

1 >>> re.findall('\d','jjoih42jd95kl3')
2 ['4', '2', '9', '5', '3']
3 >>> re.findall('\w','jjoih42jj42. d0')
4 ['j', 'j', 'o', 'i', 'h', '4', '2', 'j', 'j', '4', '2', 'd', '0']
5 >>> re.findall('\s','jjoih42jj42. d0')
6 [' ']
7 >>> re.findall('[\d]','jjoih42jj42. d0')
8 ['4', '2', '4', '2', '0']

match模式从头开始匹配

match(pattern, string, flags=0)
 # pattern： 正则模型
 # string ： 要匹配的字符串
 # falgs  ： 匹配模式

re.match() 分组与不分组使用例子

match(pattern, string, flags=0):


1 s='hello   your   myrsa'
2 r=re.match('(?P<n1>h)(?P<n2>\w+)',s)
3 print(r.group())  #获取匹配的所有结果  #h\w+， hello
4 print(r.groups())  #获取模型中匹配到的分组结果  (h)(\w+)  ,('h', 'ello')
5 print(r.groupdict()) #获取模型中匹配到的分组执行所有key的组(?P<n1>h)(?P<n2>\w+),{'n1': 'h', 'n2': 'ello'}

re.I 使匹配对大小写不敏感

re.L做本地化识别（local-aware）匹配

re.M 多行匹配，影响^和$

re.S使 . 匹配包括换行在内的所有字符

re.U根据Unicode字符集解析字符，这个标志影响\W,\b等

re.X 让匹配变得有意义，支持注释，空格将会变得无效

注意：re.match 和 re.search 匹配成功就会返回一个match Object 对象：

　　group()返回被Re匹配的字符串

　　start() 返回匹配开始的位置

　　end()返回匹配借宿的位置

　　span()返回一个元祖包含的匹配（开始，结束）的位置

　　groups() 获取模型中匹配到的分组结果

　　groupdict（）获取模型中匹配到的分组中所有执行了key的组

re.search(pattern, string, flags=0)，逐个匹配，匹配到整个字符串

1 a = "123abc456"
2 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()#123abc456
3 
4 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)#123
5 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)#abc
6 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)#456

# search,浏览整个字符串去匹配第一个，未匹配成功返回None
# search(pattern, string, flags=0)

 r = re.search("a(\w+).*(?P<name>\d)$", origin)
        print(r.group())     # 获取匹配到的所有结果
        print(r.groups())    # 获取模型中匹配到的分组结果
        print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

findall(pattern, string, flags=0)

findall使用的是group，(？P<>)使用无效

1 origin = "hello alex bcd abcd lge acd 19"
2 r = re.findall("(a)(\w*(c))(d)", origin)    #从左到右开始查找，将数据从外面括号向里查到的数据放到列表框里
3 print(r)
4 #[('a', 'bc', 'c', 'd'), ('a', 'c', 'c', 'd')]

1 origin = "1asd2asd3asd4asd"
2 r = re.findall("(\dasd)+", origin)
3 n = re.findall("\dasd", origin) #从左到右开始查找，将数据从外面括号向里查到的数据放到列表框里
4 print(r,n)  #['4asd'] ['1asd', '2asd', '3asd', '4asd']  #得到的类型两者都不相同

re.finditer()#匹配到的是可迭代的类型

r = re.finditer("(a)(\w*(c))(d)", origin)    #从左到右开始查找，将数据从外面括号向里查到的数据放到列表框里
print(r)
for i  in r:
    print(i)
#'<callable_iterator object at 0x00E162B0>
#<_sre.SRE_Match object; span=(15, 19), match='abcd'>
#<_sre.SRE_Match object; span=(24, 27), match='acd'>
'

sub(pattern, repl, string, count=0, flags=0)

# pattern： 正则模型
# repl   ： 要替换的字符串或可执行对象
# string ： 要匹配的字符串
# count  ： 指定匹配个数
# flags  ： 匹配模式

用于替换匹配的字符串

content = "123abc456"
new_content = re.sub('\d+', 'sb', content)

origin = "hello alex bcd alex lge alex acd 19"
r = re.sub("a\w+", "999", origin, 2)
print(r)

split(pattern, string, maxsplit=0, flags=0)

根据指定匹配进行分组

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('\*', content)

计算器去括号计算思路

 1 origin = '1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2))'
 2 #request=re.split('\(([^()]+)\)', origin,1)
 3 #print(request)
 4 
 5 while True:
 6     print(origin)
 7     request=re.split('\(([^()]+)\)', origin,1)
 8     if len(request)==3:
 9         before=request[0]
10         content=request[1]
11         after=request[2]
12         r=f1(content)
13         new_str=before+str(r)+after
14         origin=new_str
15     else:
16         final=f1(1+4)
17         print(final)
18         break

posted on 2017-09-09 12:02 喵吉欧尼酱阅读(180) 评论(0) 收藏举报

刷新页面返回顶部

喵吉欧尼酱

公告