随笔分类 -  爬虫

摘要:一个简单爬虫案例 from bs4 import BeautifulSoup import os import requests response = requests.get("http://www.90xiaohua.com/") response.encoding = "utf-8" # pr 阅读全文
posted @ 2020-03-25 10:18 hbfengj 阅读(117) 评论(0) 推荐(0)
摘要:一个简单的爬虫案例 from scrapy_redis.spiders import RedisSpider import os,urllib.request,time class XiaohuaSpider(scrapy.Spider): name = 'xiaohua' allowed_doma 阅读全文
posted @ 2020-03-25 10:15 hbfengj 阅读(160) 评论(0) 推荐(0)
摘要:1. scrapy框架:大而全的爬虫组件。 2. 安装:注意:scrapy依赖Twisted - Win:下载:http://www.lfd.uci.edu/~gohlke/pythonlibs/#twistedpip3 install wheel pip install Twisted-19.10 阅读全文
posted @ 2020-03-25 10:06 hbfengj 阅读(179) 评论(0) 推荐(0)
摘要:requests: 发送HTTP请求,接收响应 1. 如果浏览器能访问,requests不能访问,最坏是把浏览器请求头部,都写到requests请求头部中 import requests from bs4 import BeautifulSoup r1 = requests.get( url='ht 阅读全文
posted @ 2020-03-23 09:31 hbfengj 阅读(256) 评论(0) 推荐(0)