2012 年 12月 5 日随笔档案 - 林檎

2012年12月5日

摘要：方法1. 1 from scrapy.selector import HtmlXPathSelector 2 3 def parse(self, response): 4 hxs = HtmlXPathSelector(response) 5 items = [] 6 7 newurls = hxs.select('//a/@href').extract() 8 validurls = [] 9 for url in newurls:10 #判断URL是否合法11 if true: 12 ... 阅读全文

posted @ 2012-12-05 14:47 林檎阅读(2574) 评论(0) 推荐(0)

林檎

⑨

公告