Python爬虫重写。

有想法重写了。把一些结构写出来。。

class Crawler(object):
   def __init__(self,url,depth,threadNum,dbfile,key):
      #要获取url的队列
      self.urlQueue = Queue()
      #读取的html队列
      self.htmlQueue = Queue()
      #已经访问的url
      self.readUrls = []
      #未访问的链接
      self.links = []
      #线程数
      self.threadNum = threadNum
      #数据库文件名
      self.dbfile = dbfile
      #创建存储数据库对象
      self.dataBase = SaveDataBase(self.dbfile)
      #指点线程数目的线程池
      self.threadPool = ThreadPool(self.threadNum)
      #初始化url队列
      self.urlQueue.put(url)
      #关键字,使用console的默认编码来解码
      self.key = key.decode(getdefaultlocale()[1])
      #爬行深度
      self.depth = depth
      #当前爬行深度
      self.currentDepth = 1
      #当前程序运行状态
      self.state = False

posted @ 2014-03-11 17:31 墨迹哥's 阅读(355) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

墨迹哥's

Python爬虫重写。

公告