爬虫 - 随笔分类 - 月为暮

07爬虫之-urllib总结

摘要：# # 导入需要的爬虫库。import urllib.request# # 请求百度的连接。# file = urllib.request.urlopen('http://www.baidu.com')# # 读取返回的数据。# data = file.read()# # 将百度返回的数据写入到文件阅读全文

posted @ 2020-11-30 19:54 月为暮阅读(127) 评论(0) 推荐(0)

05爬取约会吧美女全部照片

摘要：# 爬虫思路：首先找到约会吧的链接地址，# 然后获取网页，从中提取出每个发消息用户的详情页，找到存放图片的详情页链接，# 根据地址爬取图片import requests,parsel# 用来获取约会吧主页的函数def get_yuehuiba_url(url,headers): # 通过reques 阅读全文

posted @ 2020-08-02 22:08 月为暮阅读(348) 评论(0) 推荐(0)

04爬取拉勾网Python岗位分析报告

摘要：# 导入需要的包import requestsimport time,randomfrom openpyxl import Workbookimport pymysql.cursors#@ 连接数据库；# 这个是我本地上边运行的程序，用来获取代理服务器。def get_proxy(): try: P 阅读全文

posted @ 2020-07-25 15:05 月为暮阅读(324) 评论(0) 推荐(0)

03爬取糗事百科段子

摘要：# 导入requests 和 BeautifulSoupimport requestsfrom bs4 import BeautifulSoupdef download_page(url): # 定义头部，用来骗过浏览器 headers ={'User-Agent': 'Mozilla/5.0 (W 阅读全文

posted @ 2020-07-17 22:00 月为暮阅读(508) 评论(0) 推荐(0)

02爬取豆瓣最受欢迎的250部电影

摘要：# 爬取豆瓣最受欢迎的250部电影，并写入Excel表格中import requests,xlwtfrom bs4 import BeautifulSoup# 请求豆瓣网站，获取网页源码def request_douban(url): try : # 请求url headers = {"User-A 阅读全文

posted @ 2020-07-02 13:40 月为暮阅读(323) 评论(0) 推荐(0)

01爬取当当网500本五星好评书籍

摘要：# import requests,re,json# # 定义一个函数用来请求当当网的网页信息# def request_dangdang(url):# try:# # 使用get请求# response = requests.get(url)# # 判断返回的状态码是否为200# if respo 阅读全文

posted @ 2020-07-01 21:00 月为暮阅读(437) 评论(0) 推荐(1)

月为暮

随笔分类 - 爬虫

公告