摘要:
方法1. 1 from scrapy.selector import HtmlXPathSelector 2 3 def parse(self, response): 4 hxs = HtmlXPathSelector(response) 5 items = [] 6 7 newurls = hxs.select('//a/@href').extract() 8 validurls = [] 9 for url in newurls:10 #判断URL是否合法11 if true: 12 ... 阅读全文
posted @ 2012-12-05 14:47
林檎
阅读(2574)
评论(0)
推荐(0)

浙公网安备 33010602011771号