1匹配一篇英文文章的标题 类似 The Voice Of China
#---->([A-Z][a-z ]+)+
2、匹配一个网址
类似 https://www.baidu.com http://www.cnblogs.com
#----->http[s]?://www\.\w+\.(com|cn)
--------->^((https|http|ftp|rtsp|mms)?:\/\/)[^\s]+
3、匹配年月日日期 类似 2018-12-06 2018/12/06 2018.12.06
#------>[1-9]\d{3}[\-\/\.](1[0-2]|0?\d)[\-\/\.](3[01]|0?[1-9]|[12]\d)
---->
# \d{4}(\-|\/|.)\d{1,2}\1\d{1,2}
# \d{4}(?P<sep>\-|\/|.)\d{1,2}(?P=sep)\d{1,2}
4、匹配15位或者18位身份证号
#----->[1-9](\d{16}(\d|[xX])|\d{14})
----
# ^([1-9]\d{16}[0-9x]|[1-9]\d{14})$
# ^[1-9]\d{14}(\d{2}[0-9x])?$
5、从lianjia.html中匹配出标题,户型和面积,结果如下:
[('金台路交通部部委楼南北大三居带客厅 单位自持物业', '3室1厅', '91.22平米'), ('西山枫林 高楼层南向两居 户型方正 采光好', '2室1厅', '94.14平米')]
1 from urllib import request 2 import re 3 ret = request.urlopen('file:///E:/python/模块/模块re/lianjia.html') 4 res = ret.read().decode('utf-8') 5 # print(res) 6 pattern = '<div class="title">.*?data-sl="">(?P<name>.+?)</a>.*?<span class="divide">/</span>(?P<info>.*?)<span class="divide">/</span>(?P<space>.*?)<span' 7 rs=re.findall(pattern,res,re.S) #re.S表示忽略换行符.是匹配除换行符之外的所有,加上re.S就是也匹配换行符 8 print(rs)
素材:lianjia.html
1 <!DOCTYPE html> 2 <html lang="en"> 3 <head> 4 <meta charset="UTF-8"> 5 <title>Title</title> 6 </head> 7 <body> 8 <div class="info clear"> 9 <div class="title"> 10 <a class="" href="https://bj.lianjia.com/ershoufang/101103186217.html" target="_blank" data-log_index="1" 11 data-el="ershoufang" data-housecode="101103186217" data-is_focus="1" data-sl="">金台路交通部部委楼南北大三居带客厅 单位自持物业</a> 12 <span class="new tagBlock">新上</span></div> 13 <div class="address"> 14 <div class="houseInfo"> 15 <a href="https://bj.lianjia.com/xiaoqu/1111027381816/" target="_blank" data-log_index="1" data-el="region">延静西里 </a> 16 <span class="divide">/</span>3室1厅<span class="divide">/</span>91.22平米<span class="divide">/</span>南 北<span class="divide">/</span>简装<span class="divide">/</span>有电梯 17 </div> 18 </div> 19 <div class="flood"> 20 <div class="positionInfo">低楼层(共15层) 21 <span class="divide">/</span>1984年建板塔结合 22 <span class="divide">/</span> 23 <a href="https://bj.lianjia.com/ershoufang/hongmiao/" target="_blank">红庙</a></div> 24 </div> 25 <div class="followInfo">859人关注<span class="divide">/</span>30次带看 26 <div class="timeInfo"><span class="timeIcon"></span>6天以前发布</div> 27 <div class="tag"><span class="subway">近地铁</span><span class="taxfree">房本满五年</span><span class="haskey">随时看房</span></div> 28 <div class="priceInfo"> 29 <div class="totalPrice"><span>570</span>万</div> 30 <div class="unitPrice" data-hid="101103186217" data-rid="1111027381816" data-price="62487"><span>单价62487元/平米</span></div> 31 </div> 32 </div> 33 </div> 34 <div class="info clear"> 35 <div class="title"> 36 <a class="" href="https://bj.lianjia.com/ershoufang/101103188116.html" target="_blank" data-log_index="2" 37 data-el="ershoufang" data-housecode="101103188116" data-is_focus="1" data-sl="">西山枫林 高楼层南向两居 户型方正 采光好</a> 38 <span class="new tagBlock">新上</span><span class="yezhushuo tagBlock">房主自荐</span></div> 39 <div class="address"> 40 <div class="houseInfo"> 41 <a href="https://bj.lianjia.com/xiaoqu/1111027381123/" target="_blank" data-log_index="2" data-el="region">西山枫林三期 </a> 42 <span class="divide">/</span>2室1厅<span class="divide">/</span>94.14平米<span class="divide">/</span>南<span class="divide">/</span>简装<span class="divide">/</span>有电梯 43 </div> 44 </div> 45 <div class="flood"> 46 <div class="positionInfo">中楼层(共10层) 47 <span class="divide">/</span>2006年建板楼 48 <span class="divide">/</span> 49 <a href="https://bj.lianjia.com/ershoufang/pingguoyuan1/" target="_blank">苹果园</a></div> 50 </div> 51 <div class="followInfo">630人关注<span class="divide">/</span>23次带看 52 <div class="timeInfo"><span class="timeIcon"></span>6天以前发布</div> 53 <div class="tag"><span class="taxfree">房本满五年</span><span class="haskey">随时看房</span></div> 54 <div class="priceInfo"> 55 <div class="totalPrice"><span>495</span>万</div> 56 <div class="unitPrice" data-hid="101103188116" data-rid="1111027381123" data-price="52582"><span>单价52582元/平米</span></div> 57 </div> 58 </div> 59 </div> 60 </body>
浙公网安备 33010602011771号