2019 年 9月随笔档案 - 天天见和

利用Beautiful Soup爬取招聘网站数据

摘要：import requestsfrom bs4 import BeautifulSoupimport pandas as pdfrom pandas import DataFrame url='https://search.51job.com/list/120300,000000,0000,32,9 阅读全文

posted @ 2019-09-29 23:13 天天见和阅读(338) 评论(0) 推荐(0)

Beautiful Soup：四大常用对象

摘要：from bs4 import BeautifulSoup text='''<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang='eng'>Harry Potter</title><price>29.9</p 阅读全文

posted @ 2019-09-29 21:07 天天见和阅读(642) 评论(0) 推荐(0)

代理IP的设置及处理超时异常

摘要：import requestsimport re #获得本要IP url='http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=ip' res=requests.get(url)res.encoding='utf-8' 阅读全文

posted @ 2019-09-28 06:51 天天见和阅读(897) 评论(0) 推荐(0)

一款好用的工具fake-useragent及浏览器代理池

摘要：import requestsfrom lxml import etreeimport randomfrom fake_useragent import UserAgent ua=UserAgent()uas=[]for i in range(5): uas.append(ua.random) #生阅读全文

posted @ 2019-09-26 22:51 天天见和阅读(568) 评论(0) 推荐(0)

通过设置ua模拟浏览器

摘要：import requestsfrom lxml import etree url='https://ie.icoa.cn/'head={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 阅读全文

posted @ 2019-09-26 22:15 天天见和阅读(941) 评论(0) 推荐(0)

批量获取百度贴吧娱乐明星的照片

摘要：import requestsimport re url='http://tieba.baidu.com/photo/g/bw/picture/list?kw=%E6%9D%A8%E6%B4%8B&alt=jview&rn=200&tid=4748284434&pn=1&ps=1&pe=40&inf 阅读全文

posted @ 2019-09-25 23:03 天天见和阅读(152) 评论(0) 推荐(0)

正则匹配实例：提取数字、匹配电话号码及QQ号

摘要：\d[{n},{n,},{n,m}] 匹配十进制数字 n次，最少n次，最少n次最多m次 \D 匹配非十进制数字 [...] 表示一组字符，匹配里面任一字符 [^...]不在里面的任一字符 +匹配前面的子表达式； \s 空白字符； \S 除空白字符 (?:pattern)匹配但不取结果； ^ 表示开始阅读全文

posted @ 2019-09-24 22:31 天天见和阅读(562) 评论(0) 推荐(0)

利用Python爬虫批量获取电商网站图片

摘要：import requestsimport re url='https://list.jd.com/list.html?cat=9987,653,655'res=requests.get(url)image_pat='<img width="220" height="220" data-img="1 阅读全文

posted @ 2019-09-24 22:14 天天见和阅读(985) 评论(0) 推荐(0)

将爬取到的数据存入数据框并导出

摘要：import requestsfrom lxml import etreefrom pandas import DataFrame url='https://search.51job.com/list/120800,000000,0000,32,9,99,%25E4%25BA%25A7%25E5%2 阅读全文

posted @ 2019-09-22 10:02 天天见和阅读(578) 评论(0) 推荐(0)

XPath常见用法

摘要：import requestsfrom lxml import etreeurl='https://www.baidu.com/'r=requests.get(url)r.encoding='utf-8'r.text root=etree.HTML(r.text)root.xpath('/html/ 阅读全文

posted @ 2019-09-19 19:36 天天见和阅读(173) 评论(0) 推荐(0)

python 图表

摘要：#图表 import matplotlib.pyplot as plt fig,ax=plt.subplots(2,2)ax[0,1].plot(x,y,'r--*',label='sin')ax[0,1].legend(loc='upper right')ax[0,1].grid()fig.sav 阅读全文

posted @ 2019-09-19 19:35 天天见和阅读(360) 评论(0) 推荐(0)

天天见

09 2019 档案

公告