随笔分类 -  scrapy

摘要:settings.py 项目地址:https://github.com/CH-chen/jdbook 阅读全文
posted @ 2019-02-12 14:31 CHVV 阅读(496) 评论(0) 推荐(0)
摘要:# -*- coding: utf-8 -*- import scrapy from scrapy.http.cookies import CookieJar from scrapy.http import Request class ChoutiSpider(scrapy.Spider): name = 'chouti' allowed_domains = ['chouti.c... 阅读全文
posted @ 2019-02-09 11:49 CHVV 阅读(201) 评论(0) 推荐(0)
摘要:settings 项目地址:https://github.com/CH-chen/renrencookie 阅读全文
posted @ 2019-01-29 06:28 CHVV 阅读(208) 评论(0) 推荐(0)
摘要:pipelines.py settings 项目地址:https://github.com/CH-chen/suningbook 阅读全文
posted @ 2019-01-29 06:24 CHVV 阅读(224) 评论(0) 推荐(0)
摘要:pipelines.py settings 项目地址:https://github.com/CH-chen/sun0769 阅读全文
posted @ 2019-01-29 06:16 CHVV 阅读(234) 评论(0) 推荐(0)
摘要:pipelines.py items,py settings.py 项目地址:https://github.com/CH-chen/tencent 阅读全文
posted @ 2019-01-29 06:06 CHVV 阅读(230) 评论(0) 推荐(0)
摘要:pipelines.py settings.py 注意点 阅读全文
posted @ 2019-01-29 05:57 CHVV 阅读(216) 评论(0) 推荐(0)
摘要:pipelines.py settings.py 阅读全文
posted @ 2019-01-29 05:49 CHVV 阅读(230) 评论(0) 推荐(0)
摘要:scrapy 命令: scrapy startproject xx(爬虫目录) 创建爬虫目录 cd xx 进入目录 scrapy genspilder chouti(爬虫名称) chouti.com(起始url) 然后编写 启动爬虫项目: scrapy crawl chouti(爬虫名称) --nolog(不看默认日志) # n... 阅读全文
posted @ 2019-01-29 05:42 CHVV 阅读(174) 评论(0) 推荐(0)
摘要://a[@class="n"]/@href 获取下一页网址//a[text()="下一页>"] 根据文本定位 //div[@class="indent"]/div/table 获取所有table,一级一级选//div[@class="indent"]//table 获取所有table//div[@c 阅读全文
posted @ 2019-01-21 14:25 CHVV 阅读(357) 评论(0) 推荐(0)