正则表达式相关

当给你一大堆文本信息,让你提取其中的指定数据时,可以使用正则来实现。例如:提取文本中的邮箱和手机号

import re

text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"

phone_list = re.findall("1[3|5|8|9]\d{9}", text)
print(phone_list)

正则表达式

字符相关
  •  wupeiqi 匹配文本中的wupeiqi
    import re
    
    text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff"
    data_list = re.findall("wupeiqi", text)
    print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数
  • [abc] 匹配a或b或c 字符(中括号中可以是其他字符)
    import re
    
    text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqaceiqiff"
    data_list = re.findall("[abc]", text)
    print(data_list) # ['b', 'a', 'a', 'a', 'b', 'b', 'c']
    import re
    
    text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqcceiqiff"
    data_list = re.findall("q[abc]", text)
    print(data_list) # ['qa', 'qc']
  • [^abc] 匹配除了abc意外的其他字符。
    import re
    
    text = "你wffbbupceiqiff"
    data_list = re.findall("[^abc]", text)
    print(data_list)  # ['你', 'w', 'f', 'f', 'u', 'p', 'e', 'i', 'q', 'i', 'f', 'f']
  • [a-z] 匹配a~z的任意字符( [0-9]也可以 )
    import re
    
    text = "alexrootrootadmin"
    data_list = re.findall("t[a-z]", text)
    print(data_list)  # ['tr', 'ta']
  • . 代指除换行符以外的任意字符
    import re
    
    text = "alexraotrootadmin"
    data_list = re.findall("r.o", text)
    print(data_list) # ['rao', 'roo']
    import re
    
    text = "alexraotrootadmin"
    data_list = re.findall("r.+o", text) # .+:贪婪匹配,表示匹配以r开始到O结尾的的中间所有
    print(data_list) # ['raotroo']
    import re
    
    text = "alexraotrootadmin"
    data_list = re.findall("r.+?o", text) # 非贪婪匹配
    print(data_list) # ['rao', 'roo']
  • \w 代指字母或数字或下划线(汉字)
    import re
    
    text = "北京武沛alex齐北  京武沛alex齐"
    data_list = re.findall("武\w+x", text)
    print(data_list) # ['武沛alex', '武沛alex']
  • \d 代指数字
    import re
    
    text = "root-ad32min-add3-admd1in"
    data_list = re.findall("d\d", text)
    print(data_list) # ['d3', 'd3', 'd1']
    import re
    
    text = "root-ad32min-add3-admd1in"
    data_list = re.findall("d\d+", text)
    print(data_list) # ['d32', 'd3', 'd1']
  • \s 代指任意的空白符,包括空格、制表符等
    import re
    
    text = "root admin add admin"
    data_list = re.findall("a\w+\s\w+", text)
    print(data_list) # ['admin add']
数量相关
  • * 重复0次或更多次
    import re
    
    text = "他是大B个,确实是个大2B。"
    data_list = re.findall("大2*B", text)
    print(data_list) # ['大B', '大2B']
  • + 重复1次或更多次
    import re
    
    text = "他是大B个,确实是个大2B,大3B,大66666B。"
    data_list = re.findall("大\d+B", text)
    print(data_list) # ['大2B', '大3B', '大66666B']
  • ? 重复0次或1次
    import re
    
    text = "他是大B个,确实是个大2B,大3B,大66666B。"
    data_list = re.findall("大\d?B", text)
    print(data_list) # ['大B', '大2B', '大3B']
  • {n} 重复n次
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("151312\d{5}", text)
    print(data_list) # ['15131255789']
  • {n,} 重复n次或更多次
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("\d{9,}", text)
    print(data_list) # ['442662578', '15131255789']
  • {n,m} 重复n到m次
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("\d{10,15}", text)
    print(data_list) # ['15131255789']
括号(分组)
  • 提取数据区域
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("15131(2\d{5})", text)
    print(data_list)  # ['255789']
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来15131266666呀"
    data_list = re.findall("15(13)1(2\d{5})", text)
    print(data_list)  # [ ('13', '255789')   ]
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("(15131(2\d{5}))", text)
    print(data_list)  # [('15131255789', '255789')]
  • 获取指定区域 + 或条件
    import re
    
    text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("15131(2\d{5}|r\w+太)", text)
    print(data_list)  # ['root太', '255789']
    import re
    
    text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("(15131(2\d{5}|r\w+太))", text)
    print(data_list)  # [('15131root太', 'root太'), ('15131255789', '255789')]
posted @ 2021-12-01 15:12  A熙  阅读(68)  评论(0)    收藏  举报