2020 年 3月 31 日随笔档案 - GroundControl_852

公告

2020年3月31日

摘要： import requests from lxml import etree url= "https://tieba.baidu.com/p/6585139804" headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) Ap 阅读全文

posted @ 2020-03-31 20:57 GroundControl_852 阅读(465) 评论(0) 推荐(0)

解析出所有城市名称

摘要： https://www.aqistudy.cn/historydata/ 分析思路:- 先判断是不是动态加载的数据- 找城市标签的定位,先熟悉源码 url = "https://www.aqistudy.cn/historydata/" headers = {"User-Agent": "Mozil 阅读全文

posted @ 2020-03-31 20:52 GroundControl_852 阅读(141) 评论(0) 推荐(0)

xpath爬取58二手房的房源信息

摘要： 1.爬取网站第一步确定URL,先分析这个网站的数据是不是由ajax动态加载的,对网页进行刷新,看xhr上有没有相应的数据.发现没有相应数据显示,验证这个网页的数据可以直接通过原地址来抓取,顺便把headers也拿下来,通过requests.get的方法发送请求,获取页面源码数据 page_text 阅读全文

posted @ 2020-03-31 20:50 GroundControl_852 阅读(844) 评论(0) 推荐(1)