长林丶 - 博客园

2020年4月

摘要：一.主题式网络主题式网络爬虫设计方案 1.爬虫名称：爬取微博热搜榜 2.爬虫爬取的内容：爬取微博热搜榜数据。 3.网络爬虫设计方案概述：用requests库访问页面用get方法获取页面资源，登录页面对页面HTML进行分析，用beautifulsoup库获取并提取自己所需要的信息。再讲数据保存到CSV 阅读全文

posted @ 2020-04-18 15:46 长林丶阅读(5527) 评论(0) 推荐(1)

2020年3月

获取微信热点前十

摘要： import requests from bs4 import BeautifulSoup import re cookie = {} f = open('cookie.txt','r')#微信该网页无法直接爬取添加cookie文件 for line in f.read().split(':'): 阅读全文

posted @ 2020-03-21 22:22 长林丶阅读(191) 评论(0) 推荐(0)

用Python爬取百度热点前50

摘要： import requestsfrom bs4 import BeautifulSoupimport bs4 def get_html(url,headers): r = requests.get(url,headers=headers) r.encoding = r.apparent_encodi 阅读全文

posted @ 2020-03-21 19:42 长林丶阅读(401) 评论(0) 推荐(0)

公告