正则表达式相关
当给你一大堆文本信息,让你提取其中的指定数据时,可以使用正则来实现。例如:提取文本中的邮箱和手机号
import re
text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
phone_list = re.findall("1[3|5|8|9]\d{9}", text)
print(phone_list)
正则表达式
字符相关
- wupeiqi 匹配文本中的wupeiqi
import re text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff" data_list = re.findall("wupeiqi", text) print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数 - [abc] 匹配a或b或c 字符(中括号中可以是其他字符)
import re text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqaceiqiff" data_list = re.findall("[abc]", text) print(data_list) # ['b', 'a', 'a', 'a', 'b', 'b', 'c']import re text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqcceiqiff" data_list = re.findall("q[abc]", text) print(data_list) # ['qa', 'qc'] - [^abc] 匹配除了abc意外的其他字符。
import re text = "你wffbbupceiqiff" data_list = re.findall("[^abc]", text) print(data_list) # ['你', 'w', 'f', 'f', 'u', 'p', 'e', 'i', 'q', 'i', 'f', 'f'] - [a-z] 匹配a~z的任意字符( [0-9]也可以 )
import re text = "alexrootrootadmin" data_list = re.findall("t[a-z]", text) print(data_list) # ['tr', 'ta'] .代指除换行符以外的任意字符import re text = "alexraotrootadmin" data_list = re.findall("r.o", text) print(data_list) # ['rao', 'roo']import re text = "alexraotrootadmin" data_list = re.findall("r.+o", text) # .+:贪婪匹配,表示匹配以r开始到O结尾的的中间所有 print(data_list) # ['raotroo']import re text = "alexraotrootadmin" data_list = re.findall("r.+?o", text) # 非贪婪匹配 print(data_list) # ['rao', 'roo']- \w 代指字母或数字或下划线(汉字)
import re text = "北京武沛alex齐北 京武沛alex齐" data_list = re.findall("武\w+x", text) print(data_list) # ['武沛alex', '武沛alex'] \d代指数字import re text = "root-ad32min-add3-admd1in" data_list = re.findall("d\d", text) print(data_list) # ['d3', 'd3', 'd1']import re text = "root-ad32min-add3-admd1in" data_list = re.findall("d\d+", text) print(data_list) # ['d32', 'd3', 'd1']\s代指任意的空白符,包括空格、制表符等import re text = "root admin add admin" data_list = re.findall("a\w+\s\w+", text) print(data_list) # ['admin add']
数量相关
*重复0次或更多次import re text = "他是大B个,确实是个大2B。" data_list = re.findall("大2*B", text) print(data_list) # ['大B', '大2B']+重复1次或更多次import re text = "他是大B个,确实是个大2B,大3B,大66666B。" data_list = re.findall("大\d+B", text) print(data_list) # ['大2B', '大3B', '大66666B']?重复0次或1次import re text = "他是大B个,确实是个大2B,大3B,大66666B。" data_list = re.findall("大\d?B", text) print(data_list) # ['大B', '大2B', '大3B']{n}重复n次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("151312\d{5}", text) print(data_list) # ['15131255789']{n,}重复n次或更多次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("\d{9,}", text) print(data_list) # ['442662578', '15131255789']{n,m}重复n到m次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("\d{10,15}", text) print(data_list) # ['15131255789']
括号(分组)
- 提取数据区域
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("15131(2\d{5})", text) print(data_list) # ['255789']import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来15131266666呀" data_list = re.findall("15(13)1(2\d{5})", text) print(data_list) # [ ('13', '255789') ]import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("(15131(2\d{5}))", text) print(data_list) # [('15131255789', '255789')] - 获取指定区域 + 或条件
import re text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("15131(2\d{5}|r\w+太)", text) print(data_list) # ['root太', '255789']import re text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("(15131(2\d{5}|r\w+太))", text) print(data_list) # [('15131root太', 'root太'), ('15131255789', '255789')]
浙公网安备 33010602011771号