今天刚刚接触pyspider,在调试的时候注意匹配数据可以这样写:
<div class="download-wp"> <a data-app-id="28855" data-app-vid="800689740" data-app-name="爱奇艺" data-app-pname="com.qiyi.video" data-app-vcode="81130" data-app-vname="9.7.6" data-app-icon="http://android-artworks.25pp.com/fs08/2018/08/03/6/110_60e3799782d5b646000125fb2c7b3a3c_con_130x130.png" data-app-rtype="0" data-oe="web" data-type="bind" data-feat="binded" data-app-categoryid="5029" data-app-subcategoryid="" data-install="7.9亿" data-like="64.00%" data-name="爱奇艺" data-pn="com.qiyi.video" class="install-btn i-source" rel="nofollow" href="http://www.wandoujia.com/apps/com.qiyi.video/binding?source=web_inner_referral_binded" data-track="detail-download-bind_direct_com.qiyi.video"> 安全下载 </a> <!-- --></div>
比如我想匹配当中的包名和版本名:
packagename=response.doc('div[class="download-wp"]>a').attr('data-app-pname')
version=response.doc('div[class="download-wp"]>a').attr('data-app-vname')
就是拿到了对应的a标签后,通过.attr('属性名')获取对应的属性名。
匹配安装包大小:
<dl class="infos-list"> <dt>大小</dt><dd> 29.75MB <meta itemprop="fileSize" content="29.75MB"></dd> <dt>分类</dt><dd class="tag-box"> <a href="http://www.wandoujia.com/category/5029?pos=w/tags/detail_com.qiyi.video" itemprop="SoftwareApplicationCategory" data-track="detail-click-appTag">影音播放</a> <a href="http://www.wandoujia.com/category/5029_716?pos=w/tags/detail_com.qiyi.video" itemprop="SoftwareApplicationCategory" data-track="detail-click-appTag">视频</a> </dd> <dt>TAG</dt><dd><div class="side-tags clearfix"> <div class="tag-box"><a href="http://www.wandoujia.com/tag/4"> 益智休闲 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/6"> 趣味 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/8"> 消除 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/12"> 战斗 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/10"> 经典 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/18"> 动作 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/20"> 关卡 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/22"> 简单 </a></div> </div></dd> <dt>更新</dt><dd><time id="baidu_time" itemprop="datePublished" datetime="2018年08月03日">2018年08月03日</time></dd> <dt>版本</dt><dd> 9.7.6</dd> <dt>要求</dt><dd class="perms" itemprop="operatingSystems" content="Android">Android 4.0.2 以上 <div><a href="javascript:;" rel="nofollow" class="view-perms" id="j-view-perms">查看权限要求<i class="arrow-down"> </i></a><ul id="j-perms-list" class="perms-list" style="display:none"> <li><span class="perms" itemprop="permissions">读取短信或彩信</span></li> <li><span class="perms" itemprop="permissions">发送短信或彩信</span></li> </ul></div> </dd> <dt>开发者</dt><dd><span class="dev-sites" itemprop="name">北京爱奇艺科技有限公司</span></dd> </dl>
这样写也可以
size=response.doc('.infos-list > dd>meta[itemprop="fileSize"]').attr.content
但是如果上面匹配包名的时候这样写:
packagename=response.doc('div[class="download-wp"]>a').attr.data-app-pname
就会报错~~
<dl class="infos-list"> <dt>大小</dt><dd> 29.75MB <meta itemprop="fileSize" content="29.75MB"></dd> <dt>分类</dt><dd class="tag-box"> <a href="http://www.wandoujia.com/category/5029?pos=w/tags/detail_com.qiyi.video" itemprop="SoftwareApplicationCategory" data-track="detail-click-appTag">影音播放</a> <a href="http://www.wandoujia.com/category/5029_716?pos=w/tags/detail_com.qiyi.video" itemprop="SoftwareApplicationCategory" data-track="detail-click-appTag">视频</a> </dd> <dt>TAG</dt><dd><div class="side-tags clearfix"> <div class="tag-box"><a href="http://www.wandoujia.com/tag/4"> 益智休闲 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/6"> 趣味 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/8"> 消除 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/12"> 战斗 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/10"> 经典 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/18"> 动作 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/20"> 关卡 </a></div> <div class="tag-box"><a href="http://www.wandoujia.com/tag/22"> 简单 </a></div> </div></dd> <dt>更新</dt><dd><time id="baidu_time" itemprop="datePublished" datetime="2018年08月03日">2018年08月03日</time></dd> <dt>版本</dt><dd> 9.7.6</dd> <dt>要求</dt><dd class="perms" itemprop="operatingSystems" content="Android">Android 4.0.2 以上 <div><a href="javascript:;" rel="nofollow" class="view-perms" id="j-view-perms">查看权限要求<i class="arrow-down"> </i></a><ul id="j-perms-list" class="perms-list" style="display:none"> <li><span class="perms" itemprop="permissions">读取短信或彩信</span></li> <li><span class="perms" itemprop="permissions">发送短信或彩信</span></li> </ul></div> </dd> <dt>开发者</dt><dd><span class="dev-sites" itemprop="name">北京爱奇艺科技有限公司</span></dd> </dl>
浙公网安备 33010602011771号