scrapy框架爬取糗妹妹网站qiumeimei.com图片
1. 创建项目
scrapy startproject qiumeimei
2. 建蜘蛛文件qiumei.py
cd qiumeimei
scrapy genspider qiumei www.qiumeimei.com
3. 考虑到只需要下载图片,先在items.py定义字段
import scrapy
class QiumeimeiItem(scrapy.Item):
# define the fields for your item here like:
img_path = scrapy.Field()
pass
4. 写蜘蛛文件qiumei.py
# -*- coding: utf-8 -*-
import scrapy
from qiumeimei.items import QiumeimeiItem
class QiumeiSpider(scrapy.Spider):
name = 'qiumei'
# allowed_domains = ['www.qiumeimei.com']
start_urls = ['http://www.qiumeimei.com/image']
def parse(self, response):
img_url = response.css('.main>p>img::attr(data-lazy-src)').extract()
# print(img_url)
for url in img_url:
# print(url)
item = QiumeimeiItem()
item['img_path'] = url
yield item
next_url = response.css('.pagination a.next::attr(href)').extract_first()
if next_url:
yield scrapy.Request(url=next_url,callback=self.parse)
5. 管道文件pipelines.py 这里图片是全部放在了一个文件夹里,在settings.py中定义了一个路径,见下文第6步:
import os,scrapy
from scrapy.pipelines.images import ImagesPipeline
from qiumeimei.settings import IMAGES_STORE as images_store
class QiumeimeiPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
img_path = item['img_path']
# print(000)
yield scrapy.Request(url=img_path)
def item_completed(self, results, item, info):
old_name_list = [x['path'] for t, x in results]
old_name = images_store + old_name_list[0]
# print(111)
#图片名称
from datetime import datetime
i = str(datetime.now())
# print(222)
img_path = item['img_path']
img_type = img_path.split('.')[-1]
img_name = i[:4]+i[5:7]+i[8:10]+i[11:13]+i[14:16]+i[17:19]+i[20:]
#图片路径 所有图片放在一个文件夹里
# print(333)
path = images_store + img_name +'.'+ img_type
print(path+' 已下载...')
os.rename(old_name,path)
return item
6. 设置文件settings.py
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
#图片路径,会自动创建
IMAGES_STORE = './images/'
#开启管道
ITEM_PIPELINES = {
'qiumeimei.pipelines.QiumeimeiPipeline': 300,
}
已成功:


浙公网安备 33010602011771号