摘要: 方法1. 1 from scrapy.selector import HtmlXPathSelector 2 3 def parse(self, response): 4 hxs = HtmlXPathSelector(response) 5 items = [] 6 7 newurls = hxs.select('//a/@href').extract() 8 validurls = [] 9 for url in newurls:10 #判断URL是否合法11 if true: 12 ... 阅读全文
posted @ 2012-12-05 14:47 林檎 阅读(2574) 评论(0) 推荐(0)