并发编程
1. 并行方式如何选择
看起来多进程,多线程,协程都是以并行的方式运行的,那么我们该如何选择使用什么技术呢?
-
首先我们可以简单的通过分析目标功能来选择,如果我们的项目主要是计算密集型的,比如是并行计算多个数据是否是质数这类,那么没得选,只有多进程才可以做到最大化利用cpu资源,另外两个都只能跑满一个cpu核心.
-
接着就是主要是io操作的任务了,io密集型任务首选当然是协程,也只有协程可以搞定10k问题,但python的默认I/O多是同步I/O,因此在所需依赖无法满足的情况下只能使用多线程方式替代.
协程和多线程都最多跑满一个核心,但其机制是完全不一样的,协程是用户组织代码,因此是写成顺序执行的异步执行,说白了还是在顺序执行,只是线程运行哪段代码会在协程间跳转执行,打个比方有点像拉链,只要有一个齿坏了,整个过程就会卡住. 但多线程则完全不同,一个线程卡死了并不会影响其他线程.
2. 并行编程的常用同步机制(原语)
python包含多种同步机制,这些工具使用思路上是一致的,因此无论是协程,线程还是进程都可以使用,只是使用的模块会有些许不同,用途也会有写不同
2.1. 信号量 Semaphore
在并行编程中,为了防止不同的过程(线程/进程/协程)同时对一个公用的资源进行修改,需要进行同时访问的数量(通常是1)。信号量同步基于内部计数器,每调用一次acquire(),计数器减1;每调用一次release(),计数器加1.计数器的值永远不会小于 0.当计数器为0时,acquire()调用被阻塞,直到其他线程来调用release(). Semaphore支持上下文管理协议
Semaphore的接口有两个:
- acquire() 获取一个信号量,协程中这个方法是一个协程
- release() 释放一个信号量
*locked() 协程中独有,用来判断是否被锁定
信号量有两种:
-
Semaphore
标准信号量
-
BoundedSemaphore
有界信号量,它会检查内部计数器的值,并保证它不会大于初始值,如果超了,就引发一个
ValueError
多数情况下,semaphore用于守护限制访问(但不限于1)的资源,如果 semaphore 被 release() 过多次,这意味着存在 bug.
信号量在线程,进程,协程中的使用的模块并不一样:
-
协程--
asynico.Semaphore(value=1, *, loop=None) -
线程--
threading.Semaphore(value=1) -
进程--
multiprocessing.Semaphore([value])
协程版本
import aiohttp
import asyncio
NUMBERS = range(12)
URL = 'http://httpbin.org/get?a={}'
sema = asyncio.Semaphore(3)
async def fetch_async(a):
async with aiohttp.request('GET', URL.format(a)) as r:
data = await r.json()
return data['args']['a']
async def print_result(a):
async with sema:
r = await fetch_async(a)
print('fetch({}) = {}'.format(a, r))
#loop = asyncio.new_event_loop()
#asyncio.set_event_loop(loop)
loop = asyncio.get_event_loop()
f = asyncio.wait([print_result(num) for num in NUMBERS])
loop.run_until_complete(f)
fetch(11) = 11
fetch(7) = 7
fetch(8) = 8
fetch(1) = 1
fetch(9) = 9
fetch(3) = 3
fetch(2) = 2
fetch(5) = 5
fetch(4) = 4
fetch(0) = 0
fetch(6) = 6
fetch(10) = 10
({<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>,
<Task finished coro=<print_result() done, defined at <ipython-input-1-27696bcf1a1e>:11> result=None>},
set())
多线程版本
import time
from random import random
from threading import Thread, Semaphore
sema = Semaphore(3)
def foo(tid):
with sema:
print('{} acquire sema'.format(tid))
wt = random() * 2
time.sleep(wt)
print('{} release sema'.format(tid))
threads = []
for i in range(5):
t = Thread(target=foo, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
0 acquire sema
1 acquire sema
2 acquire sema
2 release sema3 acquire sema
1 release sema4 acquire sema
3 release sema
0 release sema
4 release sema
多进程
%%writefile src/semaphore.py
from multiprocessing import Process, Semaphore
def foo(tid,sema):
import time
from random import random
with sema:
print('{} acquire sema'.format(tid))
wt = random() * 2
time.sleep(wt)
print('{} release sema'.format(tid))
if __name__ == "__main__":
sema = Semaphore(3)
processes = []
for i in range(5):
t = Process(target=foo, args=(i,sema))
processes.append(t)
for t in processes:
t.start()
for t in processes:
t.join()
Writing src/semaphore.py
!python src/semaphore.py
0 acquire sema
1 acquire sema
2 acquire sema
2 release sema
3 acquire sema
1 release sema
4 acquire sema
4 release sema
3 release sema
0 release sema
2.2. 锁Lock
Lock也可以叫做互斥锁,其实相当于信号量为1。
在多线程中锁的作用是用于锁定读写,以确认同一个资源同一时间只能被一个操作访问.
python中锁有两种
-
Lock
标准锁
-
RLock
可重入锁,可以由同一个过程多次获取.在内部,除了原始锁使用的锁定/解锁状态之外,它还使用"拥有过程"和"递归级别"的概念.在锁定状态下,某些过程拥有锁;在解锁状态下,没有线程拥有它.
我们先看一个不加锁的例子:
import time
from threading import Thread
value = 0
def getlock():
global value
new = value + 1
time.sleep(0.001) # 使用sleep让线程有机会切换
value = new
threads = []
for i in range(100):
t = Thread(target=getlock)
t.start()
threads.append(t)
for t in threads:
t.join()
print(value)
22
不加锁的情况下,结果会远远的小于100。那我们加上互斥锁看看
import time
from threading import Thread, Lock
value = 0
lock = Lock()
def getlock():
global value
with lock:
new = value + 1
time.sleep(0.001)
value = new
threads = []
for i in range(100):
t = Thread(target=getlock)
t.start()
threads.append(t)
for t in threads:
t.join()
print(value)
100
锁作为一种特殊信号量,它的接口与Semaphore一致.在线程,进程,协程中的使用的模块分别为:
-
协程--
asynico.Lock(*,loop=None) -
线程--
threading.Lock(value=1) -
进程--
multiprocessing.Lock([value])
在协程中,实际上协程并没有抢占资源的情况,因此此处的锁更多的是用来作为一个全局的变量锁定一些流程用
import asyncio
import functools
def unlock(lock):
print('callback releasing lock')
lock.release()
async def test(locker, lock):
print('{} waiting for the lock'.format(locker))
with await lock:
print('{} acquired lock'.format(locker))
print('{} released lock'.format(locker))
async def main(loop):
lock = asyncio.Lock()
await lock.acquire()
loop.call_later(0.1, functools.partial(unlock, lock))
await asyncio.wait([test('l1', lock), test('l2', lock)])
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(main(loop))
loop.close()
l1 waiting for the lock
l2 waiting for the lock
callback releasing lock
l1 acquired lock
l1 released lock
l2 acquired lock
l2 released lock
而针对于多进程,锁同样起到一个全局信号的作用,比如多个进程处理同一个文件,就需要加锁来限制
%%writefile src/lock.py
import multiprocessing
import sys
def worker_with(lock, f):
with lock:
with open(f,"a+") as fs:
fs.write('Lock acquired via with\n')
if __name__ == '__main__':
f = "source/file.txt"
lock = multiprocessing.Lock()
w = multiprocessing.Process(target=worker_with, args=(lock, f))
w.start()
w.join()
Writing src/lock.py
!python src/lock.py
2.3. 事件
一个过程发送/传递事件,所谓事件是指的一个保存标记状态的对象,如果内部标记为True则表示事件发生了,反之就是没发生
事件的接口包括:
- clear()
事件内部标记为False
- is_set()
返回事件的内部标记
- set()
调用则设置内部标记为True
- wait()
等待事件被标记为True,协程中该接口为协程
另外的过程等待事件的触发。我们用「生产者/消费者」模型的例子.
协程
import asyncio
import random
async def produce(event, n):
for x in range(1, n + 1):
# produce an item
print('producing {}/{}'.format(x, n))
# simulate i/o operation using sleep
await asyncio.sleep(random.random())
l.append(x)
event.set()
async def consume(event):
while True:
item = await event.wait()
if item == False:
break
integer = l.pop()
print('{} popped from list '.format(integer))
event.clear()
await asyncio.sleep(random.random())
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
l = []
event = asyncio.Event(loop=loop)
producer_coro = produce(event, 10)
consumer_coro = consume(event)
loop.run_until_complete(asyncio.gather(producer_coro, consumer_coro))
loop.close()
producing 1/10
producing 2/10
1 popped from list
producing 3/10
2 popped from list
producing 4/10
3 popped from list
producing 5/10
4 popped from list
producing 6/10
5 popped from list
producing 7/10
6 popped from list
producing 8/10
7 popped from list
producing 9/10
8 popped from list
producing 10/10
10 popped from list
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-10-739175a8f664> in <module>()
33 producer_coro = produce(event, 10)
34 consumer_coro = consume(event)
---> 35 loop.run_until_complete(asyncio.gather(producer_coro, consumer_coro))
36 loop.close()
~/LIB/CONDA/anaconda/lib/python3.6/asyncio/base_events.py in run_until_complete(self, future)
452 future.add_done_callback(_run_until_complete_cb)
453 try:
--> 454 self.run_forever()
455 except:
456 if new_task and future.done() and not future.cancelled():
~/LIB/CONDA/anaconda/lib/python3.6/asyncio/base_events.py in run_forever(self)
419 events._set_running_loop(self)
420 while True:
--> 421 self._run_once()
422 if self._stopping:
423 break
~/LIB/CONDA/anaconda/lib/python3.6/asyncio/base_events.py in _run_once(self)
1387 timeout * 1e3, dt * 1e3)
1388 else:
-> 1389 event_list = self._selector.select(timeout)
1390 self._process_events(event_list)
1391
~/LIB/CONDA/anaconda/lib/python3.6/selectors.py in select(self, timeout)
575 ready = []
576 try:
--> 577 kev_list = self._kqueue.control(None, max_ev, timeout)
578 except InterruptedError:
579 return ready
KeyboardInterrupt:
import time
import asyncio
from random import randint,choice
TIMEOUT = 2
async def consumer(name,event, l):
if await event.wait():
try:
integer = l.pop()
print('{} popped from list by {}'.format(integer, name))
event.clear() # 重置事件状态
except IndexError: # 为了让刚启动时容错
pass
async def producer():
for i in range(1,10):
interger = randint(10, 100)
yield interger
async def main():
event = asyncio.Event()
l = []
async for i in producer():
l.append(i)
print('{} appended to list '.format(i))
event.set() # 设置事件
consumers = [consumer( name,event ,l)
for _, name in enumerate(('c1', 'c2'))]
await choice(consumers)
await asyncio.sleep(1)
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(main())
loop.close()
17 appended to list
17 popped from list by c1
11 appended to list
11 popped from list by c2
/Users/huangsizhe/LIB/CONDA/anaconda/lib/python3.6/site-packages/ipykernel/__main__.py:29: RuntimeWarning: coroutine 'consumer' was never awaited
40 appended to list
40 popped from list by c2
79 appended to list
79 popped from list by c1
99 appended to list
99 popped from list by c1
98 appended to list
98 popped from list by c2
86 appended to list
86 popped from list by c1
16 appended to list
16 popped from list by c2
77 appended to list
77 popped from list by c1
/Users/huangsizhe/LIB/CONDA/anaconda/lib/python3.6/asyncio/events.py:126: RuntimeWarning: coroutine 'consumer' was never awaited
self._callback(*self._args)
线程
import time
import threading
from random import randint
TIMEOUT = 2
def consumer(event, l):
t = threading.currentThread()
while 1:
try:
event_is_set = event.wait(TIMEOUT)
except Exception as e:
print(e)
break
if event_is_set:
try:
integer = l.pop()
print('{} popped from list by {}'.format(integer, t.name))
event.clear() # 重置事件状态
except IndexError: # 为了让刚启动时容错
pass
else:
break
def producer(event, l):
t = threading.currentThread()
for i in range(10):
integer = randint(10, 100)
l.append(integer)
print('{} appended to list by {}'.format(integer, t.name))
event.set() # 设置事件
time.sleep(1)
event = threading.Event()
l = []
threads = []
for name in ('consumer1', 'consumer2'):
t = threading.Thread(name=name, target=consumer, args=(event, l))
t.start()
threads.append(t)
p = threading.Thread(name='producer1', target=producer, args=(event, l))
p.start()
threads.append(p)
for t in threads:
t.join()
64 appended to list by producer1
64 popped from list by consumer1
86 appended to list by producer1
86 popped from list by consumer2
82 appended to list by producer1
82 popped from list by consumer1
63 appended to list by producer1
63 popped from list by consumer2
15 appended to list by producer1
15 popped from list by consumer1
77 appended to list by producer1
77 popped from list by consumer2
96 appended to list by producer1
96 popped from list by consumer1
60 appended to list by producer1
60 popped from list by consumer2
20 appended to list by producer1
20 popped from list by consumer1
15 appended to list by producer1
15 popped from list by consumer2
2.4. 条件Condition
条件用于信号通信,它的除了拥有锁的所有接口外,还有接口:
-
notify(n=1)
释放出通知,让使用相同Condition对象的几个过程知道这个条件已被激活
-
notify_all()
释放出通知,让使用相同Condition对象的所有过程知道这个条件已被激活
-
wait()
等待使用相同Condition对象的过程的通知.
-
wait_for(predicate)
相当于
while not predicate(): cv.wait()
一个过程等待特定条件,而另一个过程发出特定条件满足的信号。最好说明的例子就是「生产者/消费者」模型:
协程方式
import asyncio
import functools
async def consumer(cond, name, second):
await asyncio.sleep(second)
async with cond:
await cond.wait()
print('{}: Resource is available to consumer'.format(name))
async def producer(cond):
await asyncio.sleep(2)
async with cond:
print('Making resource available')
cond.notify_all()
async def main(loop):
condition = asyncio.Condition()
task = loop.create_task(producer(condition))
consumers = [consumer(condition, name, index)
for index, name in enumerate(('c1', 'c2'))]
await asyncio.wait(consumers)
task.cancel()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(main(loop))
loop.close()
Making resource available
c1: Resource is available to consumer
c2: Resource is available to consumer
线程方式
import time
import threading
def consumer(cond):
t = threading.currentThread()
with cond:
cond.wait() # wait()方法创建了一个名为waiter的锁,
#并且设置锁的状态为locked。这个waiter锁用于线程间的通讯
print('{}: Resource is available to consumer'.format(t.name))
def producer(cond):
t = threading.currentThread()
with cond:
print('{}: Making resource available'.format(t.name))
cond.notify_all() # 释放waiter锁,唤醒消费者
condition = threading.Condition()
c1 = threading.Thread(name='c1', target=consumer, args=(condition,))
c2 = threading.Thread(name='c2', target=consumer, args=(condition,))
p = threading.Thread(name='p', target=producer, args=(condition,))
c1.start()
time.sleep(1)
c2.start()
time.sleep(1)
p.start()
p: Making resource available
c1: Resource is available to consumer
c2: Resource is available to consumer
进程方式
%%writefile src/cond.py
import time
import multiprocessing
def consumer(cond):
t = multiprocessing.current_process()
with cond:
cond.wait() # wait()方法创建了一个名为waiter的锁,
#并且设置锁的状态为locked。这个waiter锁用于线程间的通讯
print('{}: Resource is available to consumer'.format(t.name))
def producer(cond):
t = multiprocessing.current_process()
with cond:
print('{}: Making resource available'.format(t.name))
cond.notify_all() # 释放waiter锁,唤醒消费者
if __name__=='__main__':
condition = multiprocessing.Condition()
c1 = multiprocessing.Process(name='c1', target=consumer, args=(condition,))
c2 = multiprocessing.Process(name='c2', target=consumer, args=(condition,))
p = multiprocessing.Process(name='p', target=producer, args=(condition,))
c1.start()
time.sleep(1)
c2.start()
time.sleep(1)
p.start()
Writing src/cond.py
!python src/cond.py
p: Making resource available
c1: Resource is available to consumer
c2: Resource is available to consumer
2.5. 队列
使用队列是最常见的同步方式.也是生产者消费者模式最常见使用的工具
队列的接口有:
- qsize()
返回队列的大致大小
- empty()
如果队列为空返回True
- full()
如果队列满了,则返回空
- put(item, block=True, timeout=None)
将元素放入队列,协程中是协程
- put_nowait(item)
立即将元素放入队列
- get(block=True, timeout=None)
获取元素,并且在队列中删除该元素,协程中是协程
- get_nowait()
立即获取元素,并且在队列中删除该元素
- task_done()
表明以前入队的任务是否已经完成。
- join()
阻塞直到队列中的所有项目都被获取和处理.协程中是协程
常见的队列有两种:
- queue
先进先出队列
- LifoQueue
先进后出队列
- PriorityQueue
优先权队列,放入的元素必须是Tuple[int,Any],第一位就是权重
对不同方式使用的队列为:
-
协程--asyncio.Queue(maxsize)
-
线程--queue.Queue(maxsize)
-
进程--multiprocessing.Queue(maxsize)
依然用生产消费模式做例子
协程
import asyncio
import random
def double(n):
return n * 2
async def producer(queue, n):
count = 0
while True:
if count > 5:
break
pri = randint(0, 100)
print('put :{}'.format(pri))
await queue.put((pri, double, pri)) # (priority, func, args)
count += 1
async def consumer(queue):
while True:
pri, task, arg = await queue.get()
print('[PRI:{}] {} * 2 = {}'.format(pri, arg, task(arg)))
await asyncio.sleep(random.random())
queue.task_done()
async def run(n):
queue = asyncio.PriorityQueue(10)
# schedule the consumer
consume = asyncio.ensure_future(consumer(queue))
# run the producer and wait for completion
await producer(queue, n)
# wait until the consumer has processed all items
await queue.join()
# the consumer is still awaiting for an item, cancel it
consume.cancel()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(run(10))
loop.close()
put :44
put :9
put :74
put :95
put :48
put :76
[PRI:9] 9 * 2 = 18
[PRI:44] 44 * 2 = 88
[PRI:48] 48 * 2 = 96
[PRI:74] 74 * 2 = 148
[PRI:76] 76 * 2 = 152
[PRI:95] 95 * 2 = 190
线程
import time
import threading
from random import randint
from queue import PriorityQueue
q = PriorityQueue(10)
def double(n):
return n * 2
def producer():
count = 0
while True:
if count > 5:
break
pri = randint(0, 100)
print('put :{}'.format(pri))
q.put((pri, double, pri)) # (priority, func, args)
count += 1
def consumer():
while True:
if q.empty():
break
pri, task, arg = q.get()
print('[PRI:{}] {} * 2 = {}'.format(pri, arg, task(arg)))
q.task_done()
time.sleep(0.1)
t = threading.Thread(target=producer)
t.start()
time.sleep(1)
t = threading.Thread(target=consumer)
t.start()
put :88
put :93
put :87
put :51
put :92
put :44
[PRI:44] 44 * 2 = 88
[PRI:51] 51 * 2 = 102
[PRI:87] 87 * 2 = 174
[PRI:88] 88 * 2 = 176
[PRI:92] 92 * 2 = 184
[PRI:93] 93 * 2 = 186
进程
import time
from multiprocessing import Process
from random import randint
from multiprocessing import JoinableQueue
q = JoinableQueue(10)
def double(n):
return n * 2
def producer():
count = 0
while True:
if count > 5:
break
pri = randint(0, 100)
print('put :{}'.format(pri))
q.put((pri, double, pri)) # (priority, func, args)
count += 1
def consumer():
while True:
if q.empty():
break
pri, task, arg = q.get()
print('[PRI:{}] {} * 2 = {}'.format(pri, arg, task(arg)))
q.task_done()
time.sleep(0.1)
t = Process(target=producer)
t.start()
time.sleep(1)
t = Process(target=consumer)
t.start()
put :86
put :5
put :99
put :96
put :86
put :16
[PRI:86] 86 * 2 = 172
[PRI:5] 5 * 2 = 10
[PRI:99] 99 * 2 = 198
[PRI:96] 96 * 2 = 192
[PRI:86] 86 * 2 = 172
[PRI:16] 16 * 2 = 32
Copyright © hsz 2017 all right reserved,powered b

浙公网安备 33010602011771号