Python 线程使用模式
参考阅读:http://www.ibm.com/developerworks/aix/library/au-threadingpython/
一个小例子:
1: import threading 2: import datetime 3: 4: class ThreadClass(threading.Thread):
5: def run(self): 6: now = datetime.datetime.now()7: print "%s says Hello World at time: %s" %
8: (self.getName(), now) 9: 10: for i in range(2):
11: t = ThreadClass() 12: t.start()自己写的线程类要从threading.Thread继承,要实现run方法
Noah Gift推荐在使用python的线程时使用queue模式
1: #!/usr/bin/env python
2: import Queue 3: import threading 4: import urllib2 5: import time 6: 7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
8: "http://ibm.com", "http://apple.com"]
9: 10: queue = Queue.Queue() 11: 12: class ThreadUrl(threading.Thread):
13: """Threaded Url Grab"""
14: def __init__(self, queue): 15: threading.Thread.__init__(self) 16: self.queue = queue 17: 18: def run(self):19: while True:
20: #grabs host from queue
21: host = self.queue.get() 22: 23: #grabs urls of hosts and prints first 1024 bytes of page
24: url = urllib2.urlopen(host)25: print url.read(1024)
26: 27: #signals to queue job is done
28: self.queue.task_done() 29: 30: start = time.time() 31: def main(): 32: 33: #spawn a pool of threads, and pass them queue instance
34: for i in range(5):
35: t = ThreadUrl(queue) 36: t.setDaemon(True) 37: t.start() 38: 39: #populate queue with data
40: for host in hosts:
41: queue.put(host) 42: 43: #wait on the queue until everything has been processed
44: queue.join() 45: 46: main()47: print "Elapsed Time: %s" % (time.time() - start)
这个例子给出使用queue的模式:
1.用Queue.Queue()创建队列实例,然后用其操作数据
2.把该队列实例传入线程类中
3.生成守护线程池
4.每次从队列中取一个数据,在线程中使用改数据,使用run方法,完成工作
5.工作完成后,用queue.task_done()发送信号给队列,以表明任务已经结束
6.在queue上使用join(),这意味着一直等到queue为空时,再退出主程序
这里设置守护线程为真,是为了让主线程能够在只有守护线程时还在运行时退出,简化程序执行流程
链式处理:
1: import Queue 2: import threading 3: import urllib2 4: import time 5: from BeautifulSoup import BeautifulSoup 6: 7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
8: "http://ibm.com", "http://apple.com"]
9: 10: queue = Queue.Queue() 11: out_queue = Queue.Queue() 12: 13: class ThreadUrl(threading.Thread):
14: """Threaded Url Grab"""
15: def __init__(self, queue, out_queue): 16: threading.Thread.__init__(self) 17: self.queue = queue 18: self.out_queue = out_queue 19: 20: def run(self):21: while True:
22: #grabs host from queue
23: host = self.queue.get() 24: 25: #grabs urls of hosts and then grabs chunk of webpage
26: url = urllib2.urlopen(host) 27: chunk = url.read() 28: 29: #place chunk into out queue
30: self.out_queue.put(chunk) 31: 32: #signals to queue job is done
33: self.queue.task_done() 34: 35: class DatamineThread(threading.Thread):
36: """Threaded Url Grab"""
37: def __init__(self, out_queue): 38: threading.Thread.__init__(self) 39: self.out_queue = out_queue 40: 41: def run(self):42: while True:
43: #grabs host from queue
44: chunk = self.out_queue.get() 45: 46: #parse the chunk
47: soup = BeautifulSoup(chunk)48: print soup.findAll(['title'])
49: 50: #signals to queue job is done
51: self.out_queue.task_done() 52: 53: start = time.time() 54: def main(): 55: 56: #spawn a pool of threads, and pass them queue instance
57: for i in range(5):
58: t = ThreadUrl(queue, out_queue) 59: t.setDaemon(True) 60: t.start() 61: 62: #populate queue with data
63: for host in hosts:
64: queue.put(host) 65: 66: for i in range(5):
67: dt = DatamineThread(out_queue) 68: dt.setDaemon(True) 69: dt.start() 70: 71: 72: #wait on the queue until everything has been processed
73: queue.join() 74: out_queue.join() 75: 76: main()77: print "Elapsed Time: %s" % (time.time() - start)
可见使用queue来使用线程真的是简单方便,而且还可以通过链式queue来扩展。上面的小程序可以看做是搜索引擎和数据挖掘的基础组成部分

浙公网安备 33010602011771号