2017 年 5月 10 日随笔档案 - Erick-LONG

2017年5月10日

摘要： pipeline item 阅读全文

posted @ 2017-05-10 17:29 Erick-LONG 阅读(1733) 评论(0) 推荐(0)

摘要： rules = [ Rule(SgmlLinkExtractor(allow=('/u012150179/article/details'), restrict_xpaths=('//li[@class="next_article"]')), callback='parse_ite... 阅读全文

posted @ 2017-05-10 16:05 Erick-LONG 阅读(787) 评论(0) 推荐(0)

scrapy 避免被ban

摘要： UA池阅读全文

posted @ 2017-05-10 15:05 Erick-LONG 阅读(524) 评论(0) 推荐(0)

scrapy crawl 源码修改爬虫多开

摘要：放入项目目录，配置setting.py 阅读全文

posted @ 2017-05-10 14:19 Erick-LONG 阅读(661) 评论(0) 推荐(0)

scrapy csvfeed spider

摘要： class CsvspiderSpider(CSVFeedSpider): name = 'csvspider' allowed_domains = ['iqianyue.com'] start_urls = ['http://iqianyue.com/feed.csv'] headers = ['id', 'name', 'description', 'imag... 阅读全文

posted @ 2017-05-10 13:51 Erick-LONG 阅读(320) 评论(0) 推荐(0)

scrapy crawl xmlfeed spider

摘要： from scrapy.spiders import XMLFeedSpider from myxml.items import MyxmlItem class XmlspiderSpider(XMLFeedSpider): name = 'xmlspider' allowed_domains = ['sina.com.cn'] start_urls = ['http:... 阅读全文

posted @ 2017-05-10 13:35 Erick-LONG 阅读(217) 评论(0) 推荐(0)

scrapy 修改URL爬取起始位置

摘要： import scrapy from Autopjt.items import myItem from scrapy.http import Request class AutospdSpider(scrapy.Spider): name = "fulong_spider" start_urls = 阅读全文

posted @ 2017-05-10 13:15 Erick-LONG 阅读(1693) 评论(0) 推荐(0)

scrapy 爬取当当网产品分类

摘要： pipeline部分 item部分阅读全文

posted @ 2017-05-10 13:01 Erick-LONG 阅读(560) 评论(0) 推荐(0)

Erick - LONG

Be Patient! Be Positive! Be Persistence!

公告