线程
线程
1.线程
(1)线程:同一进程内的多个线程是共享该进程的资源;创建新的线程的开销远远小于开启新的进程。
(2)多线程(即多个控制线程):在一个进程中存在多个控制线程,多个控制线程共享该进程的地址空间。
- 多线程共享一个进程的地址空间
- 线程比进程更轻量级,线程比进程更容易创建可撤销,在许多操作系统中,创建一个线程比创建一个进程要快10-100倍,在有大量线程需要动态和快速修改时,这一特性很有用
- 若多个线程都是cpu密集型的,那么并不能获得性能上的增强,但是如果存在大量的计算和大量的I/O处理,拥有多个线程允许这些活动彼此重叠运行,从而会加快程序执行的速度。
- 在多cpu系统中,为了最大限度的利用多核,可以开启多个线程,比开进程开销要小的多。(这一条并不适用于python)
(3)开启线程
方法1:
from threading import Thread def work(n): print('%s is running'%n) if __name__=='__main__': t=Thread(target=work,args=(1,)) t.start() print('主线程') #主进程---资源角度;主线程---执行角度
方法2:
from threading import Thread class My(Thread): def __init__(self,n): super().__init__() self.n=n def run(self): print('%s is running'%self.n) if __name__=='__main__': t=My(2) t.start() print('主线程')
2.进程与线程对比
(1)进程只是用来把资源集中到一起(进程只是一个资源单位,或者说资源集合),而线程才是cpu上的执行单位。
- Threads share the address space of the process that created it; processes have their own address space.
- Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
- Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
- New threads are easily created; new processes require duplication of the parent process.
- Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
- Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.
(2)主线程和子线程共享数据
from threading import Thread from multiprocessing import Process n=100 def work(): global n n=1 if __name__=='__main__': t=Thread(target=work) t=Process(target=work) t.start() print('主线程',n)
(3)子线程与主线程的pid一样
from threading import Thread from multiprocessing import Process import os def work(): print('%s is running'%os.getpid()) if __name__=='__main__': t=Thread(target=work) t=Process(target=work) t.start() print('主线程',os.getpid())
3.线程的其它方法
from threading import Thread,current_thread,activeCount import os,time n=100 def work(): print('%s is running'%current_thread().getName()) time.sleep(4) if __name__=='__main__': t=Thread(target=work) t.start() t.join() print(t.isAlive()) print(enumerate()) print(activeCount()) print('主线程',n)
4.守护线程
主线程从执行角度就代表该进程,进程会在所有非守护线程都运行完毕后才结束,守护线程就在主线程结束后结束
(1)守护线程
from threading import Thread,current_thread import time def work(): print('%s is running'%current_thread().getName()) time.sleep(2) print('%s is done'%current_thread().getName()) if __name__=='__main__': t=Thread(target=work) t.daemon=True t.start() print('主')
(2)迷惑人的例子
from threading import Thread import time def foo(): print(123) time.sleep(2) print('end123') def bar(): print(446) time.sleep(4) print('end446') if __name__=='__main__': p1=Thread(target=foo) p2=Thread(target=bar) p1.daemon = True p1.start() p2.start() print('主')
5.GIL
1)在Cpython解释器中,同一个进程下开启的多线程,同一时刻只能有一个线程执行,无法利用多核优势。
2)GIL本质就是一把互斥锁,将并发运行变成串行,以此来控制同一时间内共享数据只能被一个任务所修改,进而保证数据安全。
3)保护不同的数据的安全,就应该加不同的锁
4)每次执行python程序,都会产生一个独立的进程。
''' #验证python test.py只会产生一个进程 #test.py内容 import os,time print(os.getpid()) time.sleep(1000) ''' python3 test.py #在windows下 tasklist |findstr python #在linux下 ps aux |grep python 验证python test.py只会产生一个进程
5)在一个python的进程内,不仅有test.py的主线程或者由该主线程开启的其他线程,还有解释器开启的垃圾回收等解释器级别的线程,总之,所有线程都运行在这一个进程内
6)所有数据都是共享的,这其中,代码作为一种数据也是被所有线程共享的(test.py的所有代码以及Cpython解释器的所有代码)
7)所有线程的任务,都需要将任务的代码当做参数传给解释器的代码去执行,即所有的线程要想运行自己的任务,首先需要解决的是能够访问到解释器的代码。
8)如果多个线程的target=work,那么执行流程是:多个线程先访问到解释器的代码,即拿到执行权限,然后将target的代码交给解释器的代码去执行
9)解释器的代码是所有线程共享的,所以垃圾回收线程也可能访问到解释器的代码而去执行,这就导致了一个问题:对于同一个数据100,可能线程1执行x=100的同时,而垃圾回收执行的是回收100的操作,解决这种问题没有什么高明的方法,就是加锁处理,保证python解释器同一时间只能执行一个任务的代码

10)GIL与lock
GIL保护的是解释器级的数据,保护用户自己的数据则需要自己加锁处理,如下图

11)现在的计算机基本上都是多核,python对于计算密集型的任务开多线程的效率并不能带来多大性能上的提升,甚至不如串行(没有大量切换),但是,对于IO密集型的任务效率还是有显著提升的。
12)多线程性能测试
from multiprocessing import Process from threading import Thread import os,time def work(): res=0 for i in range(100000000): res*=i if __name__ == '__main__': l=[] print(os.cpu_count()) #本机为4核 start=time.time() for i in range(4): p=Process(target=work) #耗时5s多 p=Thread(target=work) #耗时18s多 l.append(p) p.start() for p in l: p.join() stop=time.time() print('run time is %s' %(stop-start)) 计算密集型:多进程效率高
from multiprocessing import Process from threading import Thread import threading import os,time def work(): time.sleep(2) print('===>') if __name__ == '__main__': l=[] print(os.cpu_count()) #本机为4核 start=time.time() for i in range(400): # p=Process(target=work) #耗时12s多,大部分时间耗费在创建进程上 p=Thread(target=work) #耗时2s多 l.append(p) p.start() for p in l: p.join() stop=time.time() print('run time is %s' %(stop-start)) I/O密集型:多线程效率高
13) 应用
多进程:计算密集型,如金融分析等;
多线程:I/O密集型,如socket、爬虫、web等。
6.python模块---concurrent.futures的使用
https://docs.python.org/dev/library/concurrent.futures.html
1)submit
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import os,time,random def work(n): print('%s is running'%os.getpid()) time.sleep(random.randint(1,3)) return n**2 if __name__=='__main__': # executor=ProcessPoolExecutor() #开启进程 executor=ThreadPoolExecutor() #开启线程 futures=[] for i in range(10): # future=executor.submit(work,i).result() # print(future) future = executor.submit(work, i) futures.append(future) executor.shutdown() for i in futures: print(i.result())
2)map(函数,可迭代对象):循环执行多次任务,取不到结果,想取得结果,就用submit方法
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import os,time,random def work(n): print('%s is running'%os.getpid()) time.sleep(random.randint(1,3)) return n**2 if __name__=='__main__': # executor=ProcessPoolExecutor() #开启进程 executor=ThreadPoolExecutor() #开启线程 executor.map(work,range(10)) executor.shutdown()
3)回调函数
#回调函数 import requests import os,time from concurrent.futures import ThreadPoolExecutor # 'http://www.baidu.com' def get(url): print('%s GET %s'%(os.getpid(),url)) response = requests.get(url) if response.status_code==200: print('%s done %s'%(os.getpid(),url)) return {'url':url,'text':response.text} def parse(future): dic=future.result() print('%s PARSE %s'%(os.getpid(),dic['url'])) time.sleep(1) res='%s:%s'%(dic['url'],len(dic['text'])) with open('db.txt','a') as f: f.write(res) if __name__=='__main__': urls = [ 'https://www.baidu.com', 'https://www.python.org', 'https://www.openstack.org', 'https://help.github.com/', 'http://www.sina.com.cn/' ] p=ThreadPoolExecutor() for url in urls: p.submit(get,url).add_done_callback(parse)
4)用列表表达式简化代码
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import os,time,random def work(n): print('%s is running'%os.getpid()) time.sleep(random.randint(1,3)) return n**2 if __name__=='__main__': #executor=ProcessPoolExecutor() #开启进程 executor=ThreadPoolExecutor() #开启线程 futures=[executor.submit(work,i) for i in range(10)] executor.shutdown() for i in futures: print(i.result())
5)concurrent.futures不识别python内置异常
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time,random,os def work(n): try: time.sleep(random.randint(1,3)) raise TypeError return n**2 except TypeError as e: logger.log(e) print('%s is running' %os.getpid()) time.sleep(3) # raise NotImplementedError if __name__ == '__main__': executor=ProcessPoolExecutor() for i in range(10): future=executor.submit(work,i) # future.cancel() # future.exception() executor.shutdown(wait=True) print('主')
7.死锁现象与递归锁
(1)递归锁特点:可以acquire多次;每acquire一次,计数加1,只要计数不为0,就不能被其它线程抢到
(2)互斥锁特点:只能acquire一次
(3)死锁现象
from threading import Thread,Lock import time mutexA=Lock() mutexB=Lock() class MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutexA.acquire() print('%s抢到A锁'%self.name) mutexB.acquire() print('%s抢到B锁'%self.name) mutexB.release() mutexA.release() def f2(self): mutexB.acquire() print('%s抢到B锁' % self.name) time.sleep(1) mutexA.acquire() print('%s抢到A锁' % self.name) mutexA.release() mutexB.release() if __name__ == '__main__': for i in range(10): p=MyThread() p.start()
(4)递归锁解决死锁现象
from threading import Thread,RLock import time mutexA=mutexB=RLock() class MyThread(Thread): def run(self): self.f1() self.f2() def f1(self): mutexA.acquire() print('%s抢到A锁'%self.name) mutexB.acquire() print('%s抢到B锁'%self.name) mutexB.release() mutexA.release() def f2(self): mutexB.acquire() print('%s抢到B锁' % self.name) time.sleep(1) mutexA.acquire() print('%s抢到A锁' % self.name) mutexA.release() mutexB.release() if __name__ == '__main__': for i in range(10): p=MyThread() p.start()
8.信号量:是一把锁
from threading import Semaphore,Thread,current_thread import time,random def task(): with sm: print('%s is working'%current_thread().getName()) time.sleep(random.randint(1,3)) if __name__==('__main__'): sm=Semaphore() for i in range(10): P=Thread(target=task) P.start()
9.事件Event
from threading import Event,current_thread,Thread import time e=Event() def check(): print('%s 正在检测' %current_thread().getName()) time.sleep(5) e.set() def conn(): count=1 while not e.is_set(): if count > 3: raise TimeoutError('连接超时') print('%s 正在等待%s连接' % (current_thread().getName(),count)) e.wait(timeout=1) count+=1 print('%s 开始连接' % current_thread().getName()) if __name__ == '__main__': t1=Thread(target=check) t2=Thread(target=conn) t3=Thread(target=conn) t4=Thread(target=conn) t1.start() t2.start() t3.start() t4.start()
10.定时器
from threading import Timer def hello(n): print("hello, world",n) t = Timer(2, hello,args=(1,)) t.start() # after 1 seconds, "hello, world" will be printed
11.线程queue
(1)队列
import queue q=queue.Queue(3) #队列,括号内的大小默认为内存空间的大小 q.put(1) q.put(2) q.put(3) q.put(4) print(q.get()) print(q.get()) print(q.get())
(2)优先级队列
import queue q=queue.PriorityQueue(4) #括号内的大小默认为内存空间的大小 q.put((10,'aaa')) #括号内的数字越小优先级越高, q.put((7,'bbb')) q.put((30,'ddd')) q.put((30,'ccccccccc')) print(q.get()) print(q.get()) print(q.get()) print(q.get())
(3)堆栈
import queue q=queue.LifoQueue(3) #后进先出,堆栈 q.put(1) q.put(2) q.put(3) print(q.get()) print(q.get()) print(q.get())

浙公网安备 33010602011771号