08 2018 档案

爬虫-豆瓣活动页面(利用beautifulsoup定位资源)
摘要:from bs4 import BeautifulSoupimport requestsurl = 'https://beijing.douban.com/events/week-party'response = requests.get(url)# with open('douban_party. 阅读全文

posted @ 2018-08-23 21:27 luwanhe 阅读(150) 评论(0) 推荐(0)

爬虫-雪球网(使用beautifulsoup定位资源)
摘要:from bs4 import BeautifulSoupimport requestsheaders = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chro 阅读全文

posted @ 2018-08-23 21:24 luwanhe 阅读(215) 评论(0) 推荐(0)

爬虫-链家网(利用beautifulsoup定位资源)
摘要:from bs4 import BeautifulSoupimport requestsurl = 'https://bj.lianjia.com/ershoufang/c1111027378138/?sug=%E6%B5%81%E6%98%9F%E8%8A%B1%E5%9B%AD%E4%B8%89 阅读全文

posted @ 2018-08-23 21:22 luwanhe 阅读(559) 评论(0) 推荐(0)

selenium豆瓣登陆
摘要:from selenium import webdriverimport timeimport requestsfrom lxml import etreeimport base64# https://market.aliyun.com/products/57124001/cmapi028447.h 阅读全文

posted @ 2018-08-22 09:12 luwanhe 阅读(159) 评论(0) 推荐(0)

西刺代理多进程爬取
摘要:import requestsfrom lxml import etreeimport timeimport multiprocessing# 耗时 84.26855897903442 5# 耗时 44.181687355041504 10# 耗时 29.013262033462524 20# 耗时 阅读全文

posted @ 2018-08-22 09:11 luwanhe 阅读(114) 评论(0) 推荐(0)

知乎信息爬取(存在bug,望大牛指点)
摘要:import requestsfrom lxml import etreeimport pymysqlclass MysqlHelper(object): def __init__(self): self.db = pymysql.connect(host='127.0.0.1', port=330 阅读全文

posted @ 2018-08-19 22:04 luwanhe 阅读(290) 评论(0) 推荐(0)

电影天堂的种子爬取(数据获取不全面,存在bug望各位指点)
摘要:import requestsfrom lxml import etreeimport pymysqlfrom urllib import parseclass MysqlHelper(object): def __init__(self): self.db = pymysql.connect(ho 阅读全文

posted @ 2018-08-19 22:03 luwanhe 阅读(2316) 评论(0) 推荐(0)

腾讯招聘爬取
摘要:import requestsfrom bs4 import BeautifulSoupimport datetimeimport reimport pymysqlimport datetime#数据库封装class Mydb(): def __init__(self): try: self.con 阅读全文

posted @ 2018-08-19 21:46 luwanhe 阅读(258) 评论(0) 推荐(0)

妹子图爬取
摘要:import requestsimport pymysqlfrom lxml import etree#数据库封装class MysqlHelper(object): def __init__(self): self.db = pymysql.connect(host='127.0.0.1', po 阅读全文

posted @ 2018-08-19 21:41 luwanhe 阅读(169) 评论(0) 推荐(0)

链家信息爬取
摘要:一、数据库封装 import pymysqlclass MysqlHelper(object): def __init__(self): self.db = pymysql.connect(host='127.0.0.1', port=3306, user='root', password='abc 阅读全文

posted @ 2018-08-19 10:52 luwanhe 阅读(622) 评论(0) 推荐(0)

爬取今日头条
摘要:import reimport requestsimport json,osfrom urllib import requestdef get_detail(url,title): headers = { 'User-Agent':'Mozilla/5.0 (Windows NTr 6.1; WOW 阅读全文

posted @ 2018-08-16 23:19 luwanhe 阅读(1014) 评论(0) 推荐(0)

爬取雪球网
摘要:import requestsimport jsonurl = 'https://xueqiu.com/v4/statuses/public_timeline_by_category.json?since_id=-1&max_id={}&count={}&category=111'def xueqi 阅读全文

posted @ 2018-08-15 23:50 luwanhe 阅读(562) 评论(0) 推荐(0)

拓展get和post获取
摘要:from urllib import request, parsefrom urllib.error import HTTPError, URLError# 保存cookiefrom http import cookiejarclass session(object): def __init__(s 阅读全文

posted @ 2018-08-14 23:04 luwanhe 阅读(109) 评论(0) 推荐(0)

导航