python的多进程的基本使用

  欢迎审阅,望不吝指导。

 

multiprocessing.Process

concurrent.futures.ProcessPoolExcuter

 

一,多进程概念

    概念:进程是资源分配的最小单位,它是操作系统进行资源分配和调度运行的基本单位。也即一个正在运行的程序就是一个进程。

    并发:在一段时间内交替执行多个任务(对于单核CPU处理)(一个人,切会儿菜,炒会儿饭,煲会儿汤,再回去切会菜)

    并行:同一时间内同时执行多个任务(对于多核CPU)(多个人,一个切菜,一个炒饭,一个煲汤)

二,多进程模块

    在python中由multiprocessing包中有Process模块。

    例子:创建3个子进城,每个进城的名字和PID都不一样

 

 1 import os
 2 from multiprocessing import Process
 3 
 4 def func(num):
 5     print("I'm process %d, my id: [%s]" % (num, os.getpid()))
 6 
 7 if __name__ == '__main__':
 8     for i in range(3):
 9         p = Process(target=func, args=(i,))
10         p.start()

 

    执行结果:

1 I'm process 0, my id: [32920]
2 I'm process 1, my id: [25364]
3 I'm process 2, my id: [37428]

    用图片解释

主进程创建了3个一模一样的子进程,执行的代码要写在if __name__ == '__main__':  里面,否则会报错,相当于无线嵌套了,因为在main之外的代码都会被复制到另一个进程中去执行

 

 

 比如这个if main 之外的代码的例子

import os
import time
from multiprocessing import Process

def func(num):
    time.sleep(1)
    print("I'm process %d, my id: [%s]" % (num, os.getpid()))

print("我不在if main 当中,我在一个进程中只打印一次,我当前所在的进城是 [{}]".format(os.getpid()))


if __name__ == '__main__':
    processes = []
    for i in range(3):
        p = Process(target=func, args=(i,), name="sub process name: [{}]".format(i))
        p.start()
        processes.append(p)
    for i in processes:
        i.join()  # 还没来得及terminate就已经被主进程终止了
    print("print after all of sub process")

 

 

    执行结果

我不在if main 当中,我在一个进程中只打印一次,我当前所在的进城是 [33044]
我不在if main 当中,我在一个进程中只打印一次,我当前所在的进城是 [29356]
我不在if main 当中,我在一个进程中只打印一次,我当前所在的进城是 [28292]
我不在if main 当中,我在一个进程中只打印一次,我当前所在的进城是 [25768]
I'm process 1, my id: [28292]I'm process 0, my id: [29356]

I'm process 2, my id: [25768]
print after all of sub process

进程已结束,退出代码0

 

在if mian之外的代码被打印了四次,主线程打印一次,四个子线程分别打印了一次

 

另一种实现多进程的方法,继承该类并重写run方法    

 1 import os
 2 import time
 3 from multiprocessing import Process
 4 
 5 class MyProcess(Process):
 6     def __init__(self,name):
 7         super(MyProcess, self).__init__()
 8         self.name = name
 9 
10     def run(self) -> None:
11         time.sleep(1)
12         print("I'm process %s, my id: [%s]" % (self.name, os.getpid()))
13 
14 if __name__ == '__main__':
15     for i in range(3):
16         p = MyProcess(str(i))
17         p.start() # start 会调用重写后的run方法

  执行结果是一样的,仍然是创建3个进程

  

构造方法:

Process([group [, target [, name [, args [, kwargs]]]]])
  group: 线程组 
  target: 要执行的方法
  name: 进程名
  args/kwargs: 要传入方法的参数

实例方法:
  is_alive():返回进程是否在运行,bool类型。
  join([timeout]):阻塞当前上下文环境的进程程,直到调用此方法的进程终止或到达指定的timeout(可选参数)。
  start():进程准备就绪,等待CPU调度
  run():strat()调用run方法,如果实例进程时未制定传入target,这star执行t默认run()方法。
  terminate():不管任务是否完成,立即停止工作进程

属性:
  daemon:和线程的setDeamon功能一样
  name:进程名字
  pid:进程号

  

 

join的作用是等待所有子进程执行完之后主进程才会进行下一步代码的执行(阻塞,非阻塞,同步,异步是四个概念, 参考:https://zhuanlan.zhihu.com/p/25638474)

 1 import os
 2 import time
 3 from multiprocessing import Process
 4 
 5 def func(num):
 6     time.sleep(1)
 7     print("I'm process %d, my id: [%s]" % (num, os.getpid())) 
 8 
 9 if __name__ == '__main__':
10     processes = []
11     for i in range(3):
12         p = Process(target=func, args=(i,))
13         p.start()
       # print(p.pid) # pid 也可以通过打印该属性获得
14 processes.append(p) 15 16 print("子进程还没结束就会打印") #因为子进程需要睡眠1s,而打印这句话并不需要等待,所以会在子进程结束前打印

    执行结果

子进程还没结束就会打印
I'm process 0, my id: [35468]
I'm process 1, my id: [26536]
I'm process 2, my id: [38136]

进程已结束,退出代码0

    

 

加入join之后会等子进程结束之后才会继续执行下一次代码

 1 import os
 2 import time
 3 from multiprocessing import Process
 4 
 5 def func(num):
 6     time.sleep(1)
 7     print("I'm process %d, my id: [%s]" % (num, os.getpid()))
 8 
 9 if __name__ == '__main__':
10     processes = []
11     for i in range(3):
12         p = Process(target=func, args=(i,), name="sub process name: [{}]".format(i))
13         p.start()
14         print(p.name)  # 打印进程的名字
15         print(p.pid)  # 打印进程的pid
16         processes.append(p)
17     for i in processes:
18         i.join()
19     print("所有子进程结束了才会打印这行文字") #因为前面join了,所以需要等待所有子进城结束才会进行这一步代码的执行,也就是异步阻塞的

 

    执行结果

sub process name: [0]
32172
sub process name: [1]
37436
sub process name: [2]
27880
I'm process 0, my id: [32172]
I'm process 1, my id: [37436]
I'm process 2, my id: [27880]
所有子进程结束了才会打印这行文字

    

 

is_alive() 方法是判断子进程是否是激活状态

 1 import os
 2 import time
 3 from multiprocessing import Process
 4 
 5 def func(num):
 6     time.sleep(1)
 7     print("I'm process %d, my id: [%s]" % (num, os.getpid()))
 8 
 9 if __name__ == '__main__':
10     processes = []
11     for i in range(3):
12         p = Process(target=func, args=(i,), name="sub process name: [{}]".format(i))
14         print("sub process is alive? (before start) : ", p.is_alive())
15         p.start()
16         processes.append(p)
17         print("sub process is alive? (after start) : ", p.is_alive())
18     for i in processes:
19         i.join()
20         print("sub process is alive? (after join) : ", i.is_alive())
21     print("print after all of sub process")

 

     执行结果

sub process is alive? (before start) :  False
sub process is alive? (after start) :  True
sub process is alive? (before start) :  False
sub process is alive? (after start) :  True
sub process is alive? (before start) :  False
sub process is alive? (after start) :  True
sub process is alive? (after join) :  True
sub process is alive? (after join) :  True
sub process is alive? (after join) :  True
print after all of sub process
I'm process 0, my id: [34796]I'm process 1, my id: [35760]

I'm process 2, my id: [37540]

进程已结束,退出代码0

 

 

terminate()方式直接终端结束某个进程
 1 import os
 2 import time
 3 from multiprocessing import Process
 4 
 5 def func(num):
 6     time.sleep(1)
 7     print("I'm process %d, my id: [%s]" % (num, os.getpid()))
 8 
 9 if __name__ == '__main__':
10     processes = []
11     for i in range(3):
12         p = Process(target=func, args=(i,), name="sub process name: [{}]".format(i))
14         p.start()
15         processes.append(p)
16     for i in processes:
17         i.terminate()
18     print("print after all of sub process")

 

    执行结果

print after all of sub process

进程已结束,退出代码0

 

可看到所有的子进程都启动了任务,但是还没来得及执行sleep就已经被中断执行了所以只打印了主进程的问题

 

守护进城设置daemon = True

守护进城必须设置在start之前

守护进城守护的是主进程,如果主进程执行完毕则立即结束本次程序,会直接终止所有的子进程

import os
import time
from multiprocessing import Process

def func(num):
    time.sleep(1)
    print("I'm process %d, my id: [%s]" % (num, os.getpid()))

if __name__ == '__main__':
    processes = []
    for i in range(3):
        p = Process(target=func, args=(i,), name="sub process name: [{}]".format(i))
        p.daemon = True
        p.start()
        processes.append(p)
    for i in processes:
        i.terminate()  # 还没来得及terminate就已经被主进程终止了(I killed you before you could kill yourself)
    print("print after all of sub process")

 

     执行结果

print after all of sub process

进程已结束,退出代码0

 

 

 

    三、进程间的通信

    进程是系统独立调度核分配系统资源(CPU、内存)的基本单位,进程之间是相互独立的,每启动一个新的进程相当于把数据进行了一次克隆,子进程里的数据修改无法影响到主进程中的数据,不同子进程之间的数据也不能共享,这是多进程在使用中与多线程最明显的区别。但是难道Python多进程中间难道就是孤de 的吗?当然不是,python也提供了多种方法实现了多进程中间的通信和数据共享(可以修改一份数据)

    进程里的资源都是相互独立的,就比如两个饭店都有厨师和前台,进程间的通信就像是这两家饭店从同一家店补充菜品,两个进程间对同一个数据进行编辑

进程间的通信可以使用Queue队列,是线程安全的,它其实就是进程之间的数据管道,实现进程通信。
 
import os
from multiprocessing import Process,Queue


def fun1(que,num):
    print("I'm process %d, my id: [%s], I put my name to que" % (num, os.getpid()))
    que.put("put from :[{}], msg: [{}]".format(os.getpid(), num))

if __name__ == '__main__':
    q = Queue(maxsize=10)  # 用于进程间操作编辑同一份数据

    process_list = []
    for i in range(3):
        p = Process(target=fun1,args=(q,i,))  #注意args里面要把q对象传给我们要执行的方法,这样子进程才能和主进程用Queue来通信
        p.start()
        process_list.append(p)

    for i in process_list:
        i.join()

    print('主进程获取Queue数据')
    print(q.get())  # 主进程对que进行get操作获取que队列中的数据
    print(q.get())
    print(q.get())
    print('结束测试')

 

    执行结果

I'm process 0, my id: [32900], I put my name to que
I'm process 1, my id: [31432], I put my name to que
I'm process 2, my id: [33324], I put my name to que
主进程获取Queue数据
put from :[32900], msg: [0]
put from :[31432], msg: [1]
put from :[33324], msg: [2]
结束测试

进程已结束,退出代码0

 

这就是利用进程间安全的队列让 包括主进程在内的4个进程编辑同一份数据

 

除了Queue队列以外还可以使用管道Pipe让进间隔进行统计

Pipe方法返回(conn1, conn2)代表一个管道的两个端。Pipe方法有duplex参数,如果duplex参数为True(默认值),那么这个参数是全双工模式,也就是说conn1和conn2均可收发。若duplex为False,conn1只负责接收消息,conn2只负责发送消息。send和recv方法分别是发送和接受消息的方法。例如,在全双工模式下,可以调用conn1.send发送消息,conn1.recv接收消息。如果没有消息可接收,recv方法会一直阻塞。如果管道已经被关闭,那么recv方法会抛出EOFError

 

Pipe([duplex])
    在进程之间创建一条管道,并返回元组(conn1,conn2),其中conn1,conn2表示管道两端的连接对象,强调一点:必须在产生Process对象之前产生管道
    dumplex:默认管道是双向的,如果将duplex射成False,conn1只能用于接收,conn2只能用于发送。

conn1.recv()
    接收conn2.send(obj)发送的对象。如果没有消息可接收,recv方法会一直阻塞。
    如果连接的另外一端已经关闭,那么recv方法会抛出EOFError。

conn1.send(obj)
    通过连接发送对象。obj是与序列化兼容的任意对象

conn1.close()
    关闭连接。如果conn1被垃圾回收,将自动调用此方法

conn1.fileno()
    返回连接使用的整数文件描述符

conn1.poll([timeout])
    如果连接上的数据可用,返回True。
    timeout指定等待的最长时限。如果省略此参数,方法将立即返回结果。如果将timeout设成None,操作将无限期地等待数据到达。

conn1.recv_bytes([maxlength])
    接收c.send_bytes()方法发送的一条完整的字节消息。
    maxlength指定要接收的最大字节数。如果进入的消息,超过了这个最大值,将引发IOError异常,并且在连接上无法进行进一步读取。
    如果连接的另外一端已经关闭,再也不存在任何数据,将引发EOFError异常。

conn.send_bytes(buffer [, offset [, size]])
    通过连接发送字节数据缓冲区,buffer是支持缓冲区接口的任意对象,offset是缓冲区中的字节偏移量,而size是要发送字节数。
    结果数据以单条消息的形式发出,然后调用c.recv_bytes()函数进行接收

conn1.recv_bytes_into(buffer [, offset])
    接收一条完整的字节消息,并把它保存在buffer对象中,该对象支持可写入的缓冲区接口(即bytearray对象或类似的对象)。
    offset指定缓冲区中放置消息处的字节位移。返回值是收到的字节数。如果消息长度大于可用的缓冲区空间,将引发BufferTooShort异常。

 

from multiprocessing import Process, Pipe

def func(connToSub):
    connToSub.send("hellow Main Process")  # 先发再收, 如果注释这行,就会block
    print("子进程接收到的消息:", connToSub.recv())

if __name__ == '__main__':
    connToMain, connToSub = Pipe()

    p = Process(target=func, args=(connToSub,))
    p.start()
print("主进程收到的消息: ", connToMain.recv()) # 先收再发
    connToMain.send("hi SubProcess")

    p.join()

 

    执行结果

before main send
主进程收到的消息:  hellow Main Process
after main send
子进程接收到的消息: hi SubProcess

进程已结束,退出代码0

 

如果把子进程的send给注释了的话,主进程将会block,因为主进程recv不到东西就会一直停在原地

from multiprocessing import Process, Pipe

def func(connToSub):
    # connToSub.send("hellow Main Process")  # 先发再收
    print("子进程接收到的消息:", connToSub.recv())

if __name__ == '__main__':
    connToMain, connToSub = Pipe()

    p = Process(target=func, args=(connToSub,))
    p.start()

    print("before main send")
    print("主进程收到的消息: ", connToMain.recv()) # 先收再发
    print("after main send")
    connToMain.send("hi SubProcess")

    p.join()

    执行结果, 注意,是非正常退出,因为被block住了

before main send
Traceback (most recent call last):
    .................
KeyboardInterrupt

进程已结束,退出代码-1073741510 (0xC000013A: interrupted by Ctrl+C)

 

 

如果是多个子进程之间与一个主进程通信的话,需要注合理的sned,否则某个子进程会被block住,又因为join的原因导致了“死锁”的现象

import os
from multiprocessing import Process, Pipe

def func(connToSub):
    connToSub.send("hellow Main Process, I'm subprocess {}".format(os.getpid()))  # 先发再收
    print("子进程接收到的消息:", connToSub.recv())

if __name__ == '__main__':
    connToMain, connToSub = Pipe()

    processes = []
    for i in range(3):
        p = Process(target=func, args=(connToSub,))
        p.start()
        processes.append(p)
        connToMain.send("hi SubProcess")  # 注意这一行,多少个子进程旧send了多少次,防止有的进城没有recv到就block住了,
        print("主进程收到的消息: ", connToMain.recv())  # 对于主进程而言可以不recv,但是不能不send给子进程
        
    for i in processes:
        i.join()

 

    执行结果

子进程接收到的消息: hi SubProcess
主进程收到的消息:  hellow Main Process, I'm subprocess 32876
子进程接收到的消息: hi SubProcess
主进程收到的消息:  hellow Main Process, I'm subprocess 33268
子进程接收到的消息: hi SubProcess
主进程收到的消息:  hellow Main Process, I'm subprocess 37052

进程已结束,退出代码0

 

 

 

然而在 Queue和Pipe中只是做到了数据交互,即一个进程去更改另一个进程的数据,如果想要做到数据共享就需要用到Manager了

Manager

from multiprocessing import Process, Manager

def func(dic, lis, index):
    dic["key" + str(index)] = "value" + str(index)  # 所有进程编辑传进来的同一个manager下的同一个对象
    lis.append(index)

if __name__ == '__main__':
    with Manager() as manager:  # a = Manager(); lis = a.list()
        dic = manager.dict()
        lis = manager.list(range(5))

        processes = []
        for i in range(10):
            p = Process(target=func(dic,lis, i))
            p.start()
            processes.append(p)

        for i in processes:
            i.join()

        print(dic)
        print(lis)

    执行结果

{'key0': 'value0', 'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

进程已结束,退出代码0

 

 

在Manager下有多种数据类型,Manager管理的共享数据类型有:Value、Array、dict、list、Lock、Semaphore等等,同时Manager还可以共享类的实例对象。

Manager的作用是提供多进程共享的全局变量,Manager()方法会返回一个对象,该对象控制着一个服务进程,该进程中保存的对象运行其他进程使用代理进行操作。

Manager支持的类型有:list,dict,Namespace,Lock,RLock,Semaphore,BoundedSemaphore,Condition,Event,Queue,Value和Array。

from multiprocessing import Process, Manager

def func(num ,dic, lis, arr, lock, index):
    with lock:  # 进程锁(本就是共享是数据,可用可不用)
        dic["key" + str(index)] = "value" + str(index)
        lis.append(index)
        num.set(num.value + 10)  # num.value += 10  # 每个进程都 +10
        for i in range(len(arr)):
            arr[i] += 10  # arr 中的每个元素值都 +10

if __name__ == '__main__':
    with Manager() as manager:  # a = Manager(); lis = a.list()
        dic = manager.dict()
        lis = manager.list(range(5))
        val = manager.Value("i", 1.0)
        array = manager.Array('i', range(10))
        lock = manager.Lock()

        processes = []
        for i in range(10):
            p = Process(target=func(val, dic,lis, array, lock, i))
            p.start()
            processes.append(p)

        for i in processes:
            i.join()

        print(dic)
        print(lis)
        print(val.value, val)
        print(array)
        print(lock)

    执行结果

{'key0': 'value0', 'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4', 'key5': 'value5', 'key6': 'value6', 'key7': 'value7', 'key8': 'value8', 'key9': 'value9'}
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
101.0 Value('i', 101.0)
array('i', [100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
<unlocked _thread.lock object at 0x000001C994957F80>

进程已结束,退出代码0

 Manager作为管理器,还可以共享自定义的实例给其他进程(进程间共享实例)

from multiprocessing import Process,Value,Lock
from multiprocessing.managers import BaseManager
class Employee(object):
    def __init__(self,name,salary):
        self.name=name
        self.salary=Value('i',salary)
    def increase(self):
        self.salary.value+=100
    def getPay(self):
        return self.name+':'+str(self.salary.value)
class MyManager(BaseManager):
    pass
def Manager2():
    m=MyManager()
    m.start()
    return m
MyManager.register('Employee',Employee)

def func1(em,lock):
    with lock:
        em.increase()

if __name__ == '__main__':
    manager=Manager2()
    em=manager.Employee('zhangsan',1000)
    lock=Lock()
    proces=[Process(target=func1,args=(em,lock))for i in range(10)]
    for p in proces:
        p.start()
    for p in proces:
        p.join()
    print(em.getPay())

    执行结果

zhangsan:2000

进程已结束,退出代码0

 

 

manager做分布式多进程参考 https://www.cnblogs.com/guguobao/p/9400299.html

https://www.cnblogs.com/chentianwei/p/11914268.html

 

 

 

 

   四、进程池

    进程池概念:进程池内部维护一个进程序列,当使用时,则去进程池中获取一个进程,如果进程池序列中没有可供使用的进进程,那么程序就会等待,直到进程池中有可用进程为止。就是固定有几个进程可以使用。

     使用 Process 不能用 return 返回目标函数的结果,进程池(pool)可以使用回调函数获取结果

在python3.2 版本之后已经将线程池和进程池整合成了内置模块

 

 

from concurrent.futures import ProcessPoolExecutor  # python version >= 3.2

 

import os
import time
from concurrent.futures import ProcessPoolExecutor

def func(num):
    time.sleep(2)
    print("I'm subprocess [{}], start at time: [{}]".format(num, time.ctime()))

if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    for i in range(10):
        pool.submit(func, i)
   print("main process execute over")

    执行结果

main process execute over
I'm subprocess [1], start at time: [Sat Dec 24 20:00:47 2022]
I
'm subprocess [2], start at time: [Sat Dec 24 20:00:47 2022] I'm subprocess [0], start at time: [Sat Dec 24 20:00:47 2022] I'm subprocess [4], start at time: [Sat Dec 24 20:00:49 2022]
I
'm subprocess [3], start at time: [Sat Dec 24 20:00:49 2022]
I'm subprocess [5], start at time: [Sat Dec 24 20:00:49 2022] I'm subprocess [8], start at time: [Sat Dec 24 20:00:51 2022]
I
'm subprocess [6], start at time: [Sat Dec 24 20:00:51 2022]
I'm subprocess [7], start at time: [Sat Dec 24 20:00:51 2022] I'm subprocess [9], start at time: [Sat Dec 24 20:00:53 2022] 进程已结束,退出代码0

func会强制睡眠2秒,进程池中的works数量只有3个,虽然创建开了10个子进程,但是在同一时间只有3个进程在工作,直到有task工作完毕释放了进程,进程回到进程池之后才会继续执行下一个task。

submit() 是将任务提交到进程池,进程池中空闲的进程回自动领取任务执行

以上例子中是先打印“over”然后再执行subprocess的,也就是没有等待子进程执行完再继续执行主进程

设置pool.shutdown(wait=True),相当于是 process.jon(),会等所有子进程执行完毕回收资源之后再进行主进程的执行

 

import os
import time
from concurrent.futures import ProcessPoolExecutor

def func(num):
    time.sleep(2)
    print("I'm subprocess [{}], start at time: [{}]".format(num, time.ctime()))

if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    for i in range(10):
        pool.submit(func, i)

    pool.shutdown(wait=True)
    print("main process execute over")

    执行结果 

I'm subprocess [0], start at time: [Sat Dec 24 20:09:40 2022]
I'm subprocess [2], start at time: [Sat Dec 24 20:09:40 2022]
I'm subprocess [1], start at time: [Sat Dec 24 20:09:40 2022]
I'm subprocess [3], start at time: [Sat Dec 24 20:09:42 2022]
I'm subprocess [4], start at time: [Sat Dec 24 20:09:42 2022]
I'm subprocess [5], start at time: [Sat Dec 24 20:09:42 2022]
I'm subprocess [6], start at time: [Sat Dec 24 20:09:44 2022]
I'm subprocess [8], start at time: [Sat Dec 24 20:09:44 2022]
I'm subprocess [7], start at time: [Sat Dec 24 20:09:44 2022]
I'm subprocess [9], start at time: [Sat Dec 24 20:09:46 2022]
main process execute over

进程已结束,退出代码0

 

 

 

进程池还有个map方法,与python内建的map方法用法一样,相当于省去了for循环submit的操作,同样返回的是一个数组,可以循环获取结果

import os
import time
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import pool

def func(num):
    time.sleep(2)
    print("I'm subprocess [{}], start at time: [{}]".format(num, time.ctime()))
    return "return from [{}]".format(num)

if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    nums = [i for i in range(5)]
    resss = pool.map(func, nums)


    pool.shutdown()
    print([i for i in resss])

    print("main process execute over")

    执行结果

I'm subprocess [1], start at time: [Sat Dec 24 21:21:05 2022]
I'm subprocess [2], start at time: [Sat Dec 24 21:21:05 2022]I'm subprocess [0], start at time: [Sat Dec 24 21:21:05 2022]

I'm subprocess [4], start at time: [Sat Dec 24 21:21:07 2022]I'm subprocess [3], start at time: [Sat Dec 24 21:21:07 2022]

['return from [0]', 'return from [1]', 'return from [2]', 'return from [3]', 'return from [4]']
main process execute over

进程已结束,退出代码0

 

add_done_callback() 是添加回调函数,因为有的task的执行长短不一,并且在task有return的情况下,可以添加回调函数进行获取return的结果,回调函数需要有一个形式参数,传的实参是个Future实例

import os
import time
from concurrent.futures import ProcessPoolExecutor

def func(num):
    time.sleep(2)
    print("I'm subprocess [{}], start at time: [{}]".format(num, time.ctime()))
    return "I'm [{}] call done".format(num)


def my_callback(future):  # 注意这里是形参
    print(future.result())


if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    for i in range(6):
        future = pool.submit(func, i)
        future.add_done_callback(my_callback)

    pool.shutdown(wait=True)
    print("main process execute over")

    执行结果

I'm subprocess [2], start at time: [Sat Dec 24 20:14:21 2022]
I
'm subprocess [0], start at time: [Sat Dec 24 20:14:21 2022]
I'm subprocess [1], start at time: [Sat Dec 24 20:14:21 2022] I'm [1] call done I'm [2] call done I'm [0] call done I'm subprocess [3], start at time: [Sat Dec 24 20:14:23 2022]
I
'm subprocess [5], start at time: [Sat Dec 24 20:14:23 2022]
I'm subprocess [4], start at time: [Sat Dec 24 20:14:23 2022] I'm [5] call done I'm [4] call done I'm [3] call done main process execute over 进程已结束,退出代码0

当进程池中的进城执行完task之后会回调my_callback函数,这个时候传入的是个future实例,可以调用该实例的方法进行打印结果,同样,如果不用回调函数来打印结果也是可以的,如下

import os
import time
from concurrent.futures import ProcessPoolExecutor

def func(num):
    time.sleep(2)
    print("I'm subprocess [{}], start at time: [{}]".format(num, time.ctime()))
    return "I'm [{}] call done".format(num)


def my_callback(future):  # 注意这里是形参
    # print(future.result())  # 用下面的结果
    pass

if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    futeres = []

    for i in range(6):
        future = pool.submit(func, i)  # 这里返回的是future实例
        future.add_done_callback(my_callback)
        futeres.append(future)

    pool.shutdown(wait=True)
    for i in futeres:
        print(i.result())  # 打印实例的结果

    print("main process execute over")

 

进程池还有as_complete方法

as_completed()方法是一个生成器,在没有任务完成的时候,会阻塞,在有某个任务完成的时候,会yield这个任务,就能执行for循环下面的语句,然后继续阻塞住,循环到所有的任务结束。从结果也可以看出,先完成的任务会先通知主线程。

import os
import random
import time
from concurrent.futures import ProcessPoolExecutor, as_completed


def func(num):
    print("I'm [{}], my pid: [{}], start at time: [{}]".format(num, os.getpid(), time.ctime()))
    time.sleep(random.randrange(5))
    return "I'm [{}], stop at time [{}]".format(num, time.ctime())


if __name__ == '__main__':
    pool = ProcessPoolExecutor(5)

    processes = []
    for i in range(10):
        processes.append(pool.submit(func, i))

    for i in as_completed(processes):
        print(i.result())

    执行结果

I'm [0], my pid: [51764], start at time: [Sun Dec 25 15:35:26 2022]
I'm [1], my pid: [45236], start at time: [Sun Dec 25 15:35:26 2022]
I'm [2], my pid: [48612], start at time: [Sun Dec 25 15:35:26 2022]
I'm [3], my pid: [48612], start at time: [Sun Dec 25 15:35:26 2022]
I'm [2], stop at time [Sun Dec 25 15:35:26 2022]
I'm [4], my pid: [51536], start at time: [Sun Dec 25 15:35:26 2022]
I'm [5], my pid: [47808], start at time: [Sun Dec 25 15:35:26 2022]
I'm [6], my pid: [51764], start at time: [Sun Dec 25 15:35:29 2022]
I'm [0], stop at time [Sun Dec 25 15:35:29 2022]
I'm [7], my pid: [45236], start at time: [Sun Dec 25 15:35:29 2022]
I'm [1], stop at time [Sun Dec 25 15:35:29 2022]
I'm [8], my pid: [48612], start at time: [Sun Dec 25 15:35:29 2022]
I'm [3], stop at time [Sun Dec 25 15:35:29 2022]
I'm [9], my pid: [47808], start at time: [Sun Dec 25 15:35:29 2022]
I'm [5], stop at time [Sun Dec 25 15:35:29 2022]
I'm [7], stop at time [Sun Dec 25 15:35:30 2022]
I'm [8], stop at time [Sun Dec 25 15:35:30 2022]
I'm [4], stop at time [Sun Dec 25 15:35:30 2022]
I'm [6], stop at time [Sun Dec 25 15:35:31 2022]
I'm [9], stop at time [Sun Dec 25 15:35:32 2022]

进程已结束,退出代码0

情况就是谁先完成,谁就进入循环,如果都没有完成,则as_complete会进行阻塞,不会进入循环,知道有进程完成了,才会进入循环

 

    五、如何控制线程的触发执行

    不同线程的执行可能是有顺序的,或者说他们的执行是有条件的,是要受控制的。

     要实现对多个线程进行控制,其实本质上就是消息通信机制在起作用,利用这个机制发送指令,告诉线程,什么时候可以执行,什么时候不可以执行,执行什么内容。

Event,通用的条件变量,多进程间可以等待某个事件的发生,在发生后激活所有的进程

event = Event()

# 重置event,使得所有该event事件都处于待命状态
event.clear()

# 等待接收event的指令,决定是否阻塞程序执行
event.wait()

# 发送event指令,使所有设置该event事件的线程执行
event.set()

举例:

import os
import time
from multiprocessing import Process, Event


def fun(num, event):
    print("I'm [{}], my pid: [{}], start at time [{}]".format(num, os.getpid(), time.ctime()))
    event.wait()  # 先等待
    print("I'm [{}], I'm finished at time [{}]".format(num, time.ctime()))


if __name__ == '__main__':
    event = Event()
    processes = []

    for i in range(10):
        p = Process(target=fun, args=(i, event))
        processes.append(p)  # 此时先不要启动线程

    event.clear()  # 重置event, 让 event.wait() 起到阻塞的作用,

    for i in processes:
        i.start()  # 启动所有线程

    # for i in processes:
    #     i.join()  # 这个时候不能再join了,因为join是要等所有进程结束了才会继续往下走,而event本身就是阻塞了进程不可能会让进程结束,所以互相在死等,谁也不让谁
    
    print("let all of processes sleep 5s")
    time.sleep(5)

    event.set()  # 唤醒所有进程,让所有进程起来工作

    执行结果

let all of processes sleep 5s
I'm [0], my pid: [31560], start at time [Sat Dec 24 23:58:18 2022]
I'm [1], my pid: [42944], start at time [Sat Dec 24 23:58:18 2022]
I'm [2], my pid: [44104], start at time [Sat Dec 24 23:58:18 2022]
I'm [3], my pid: [44552], start at time [Sat Dec 24 23:58:18 2022]
I'm [4], my pid: [42460], start at time [Sat Dec 24 23:58:18 2022]
I'm [5], my pid: [44220], start at time [Sat Dec 24 23:58:18 2022]
I'm [6], my pid: [41892], start at time [Sat Dec 24 23:58:18 2022]
I'm [7], my pid: [41844], start at time [Sat Dec 24 23:58:18 2022]
I'm [8], my pid: [38416], start at time [Sat Dec 24 23:58:18 2022]
I'm [9], my pid: [1100], start at time [Sat Dec 24 23:58:18 2022]
I'm [0], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [1], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [4], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [6], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [7], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [8], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [2], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [3], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [9], I'm finished at time [Sat Dec 24 23:58:23 2022]
I'm [5], I'm finished at time [Sat Dec 24 23:58:23 2022]

进程已结束,退出代码0

  会发现所有线程的结束与开始之间差了5s,这就是event事件阻塞的作用,

  可见在所有线程都启动(start())后,并不会执行完,而是都在self.event.wait()止住了,需要我们通过event.set()来给所有线程发送执行指令才能往下执行。

 

Condition, 与Event类似

cond = Condition()

# 类似lock.acquire()
cond.acquire()

# 类似lock.release()
cond.release()

# 等待指定触发,同时会释放对锁的获取,直到被notify才重新占有琐。
cond.wait()

# 发送指定,触发执行
cond.notify()
Condition(条件变量)通常与一个锁关联。需要在多个Contidion中共享一个锁时,可以传递一个Lock/RLock实例给构造方法,否则它将自己生成一个RLock实例。

可以认为,除了Lock带有的锁定池外,Condition还包含一个等待池,池中的线程处于状态图中的等待阻塞状态,直到另一个线程调用notify()/notifyAll()通知;得到通知后线程进入锁定池等待锁定。

Condition():

acquire(): 线程锁
release(): 释放锁
wait(timeout): 线程挂起,直到收到一个notify通知或者超时(可选的,浮点数,单位是秒s)才会被唤醒继续运行。wait()必须在已获得Lock前提下才能调用,否则会触发RuntimeError。
notify(n=1): 通知其他线程,那些挂起的线程接到这个通知之后会开始运行,默认是通知一个正等待该condition的线程,最多则唤醒n个等待的线程。notify()必须在已获得Lock前提下才能调用,否则会触发RuntimeError。notify()不会主动释放Lock。
notifyAll(): 如果wait状态线程比较多,notifyAll的作用就是通知所有线程

 

import os
import time
from multiprocessing import Process, Condition, Manager


def funcadd(conn, num):
    conn.acquire()  # 只有成功获取到锁才会继续放下执行
    print("I'm Process1, my pid: [{}]".format(os.getpid()))
    print("Start to add")
    while True:
        num.append(1)
        print("num =", num)
        time.sleep(0.2)
        if len(num) >= 5:
            print("add done ----------")
            conn.notify(n=1)  # 唤醒其他的进程, 唤醒一个其他进程
            conn.wait()  # 让当前进程进入等待
            # conn.release()
            break


def funcminus(conn, num):
    time.sleep(1)
    conn.acquire()
    print("I'm Process2, my pid: [{}]".format(os.getpid()))
    print("Start to minus")
    while True:
        num.pop()
        print("num =", num)
        time.sleep(0.2)
        if len(num) <= 0:
            print("minus done ----------")
            conn.notify(n=1)  # 通知唤醒其他进程
            conn.wait()  # 使当前进程pending, 直到被其他进程通知激活唤醒,本身就具有release的功能
            # conn.release()
            break


if __name__ == '__main__':
    manager = Manager()
    nums = manager.list()

    conn = Condition()

    p = Process(target=funcadd, args=(conn, nums))
    p1 = Process(target=funcminus, args=(conn, nums))
    p.start()
    p1.start()
    p.join()
    p1.join()

    执行结果

I'm Process1, my pid: [50044]
Start to add
num = [1]
num = [1, 1]
num = [1, 1, 1]
num = [1, 1, 1, 1]
num = [1, 1, 1, 1, 1]
add done ----------
I'm Process2, my pid: [46916]
Start to minus
num = [1, 1, 1, 1]
num = [1, 1, 1]
num = [1, 1]
num = [1]
num = []
minus done ----------

 

 

Queue,这是属于Queue的方法讲解

from queue import Queue
# maxsize默认为0,不受限
# 一旦>0,而消息数又达到限制,q.put()也将阻塞
q = Queue(maxsize=0)

# 默认阻塞程序,等待队列消息,可设置超时时间
q.get(block=True, timeout=None)

# 发送消息:默认会阻塞程序至队列中有空闲位置放入数据
q.put(item, block=True, timeout=None)

# 等待所有的消息都被消费完
q.join()  # 貌似在PY3.11中已经被启用,待确定


# 通知队列任务处理已经完成,当所有任务都处理完成时,join() 阻塞将会解除
q.task_done()  # 貌似在PY3.11中已经被启用,待确定
# 查询当前队列的消息个数
q.qsize()

# 队列消息是否都被消费完,返回 True/False
q.empty()

# 检测队列里消息是否已满
q.full()

    例子:

import os
import random
import time
from multiprocessing import Process, Queue, Semaphore


def func(que, num, se):
    se.acquire()
    print("I'm [{}], my pid: [{}]".format(num, os.getpid()))
    time.sleep(random.randrange(5))
    que.put(num)  # 默认会阻塞程序,直到Queue队列中腾出一个空位,才会进行put进去的操作
    print("I'm [{}] que count:".format(num), que.qsize())
    print("I'm [{}] que is empty? :".format(num), que.empty())
    print("I'm [{}] que is full? :".format(num), que.full())
    if que.full():
        get_from_que = que.get()  # 默认会阻塞程序,知道队列中不是空的切获取到一个数据
        print("I'm [{}], i got sth here: [{}]".format(num, get_from_que))
    se.release()


if __name__ == '__main__':
    que = Queue(maxsize=3)  # 设置队列只有3个槽位
    se = Semaphore(2)  # 设置并行数量为2

    processes = []
    for i in range(10):
        p = Process(target=func, args=(que, i, se))
        p.start()
        processes.append(p)

    for i in processes:
        i.join()
    print("ooooooooooooooooover")

    执行结果

I'm [1], my pid: [48112]
I'm [0], my pid: [41936]
I'm [1] que count: 1
I'm [1] que is empty? : False
I'm [1] que is full? : False
I'm [2], my pid: [45900]
I'm [0] que count: 2
I'm [0] que is empty? : False
I'm [0] que is full? : False
I'm [3], my pid: [45240]
I'm [3] que count: 3
I'm [3] que is empty? : False
I'm [3] que is full? : True
I'm [3], i got sth here: [1]
I'm [4], my pid: [46008]
I'm [2] que count: 3
I'm [2] que is empty? : False
I'm [2] que is full? : True
I'm [2], i got sth here: [0]
I'm [6], my pid: [51112]
I'm [6] que count: 3
I'm [6] que is empty? : False
I'm [6] que is full? : True
I'm [6], i got sth here: [3]
I'm [5], my pid: [50392]
I'm [4] que count: 3
I'm [4] que is empty? : False
I'm [4] que is full? : True
I'm [4], i got sth here: [2]
I'm [7], my pid: [50408]
I'm [5] que count: 3
I'm [5] que is empty? : False
I'm [5] que is full? : True
I'm [5], i got sth here: [6]
I'm [8], my pid: [38168]
I'm [8] que count: 3
I'm [8] que is empty? : False
I'm [8] que is full? : True
I'm [8], i got sth here: [4]
I'm [9], my pid: [34760]
I'm [9] que count: 3
I'm [9] que is empty? : False
I'm [9] que is full? : True
I'm [9], i got sth here: [5]
I'm [7] que count: 3
I'm [7] que is empty? : False
I'm [7] que is full? : True
I'm [7], i got sth here: [8]
ooooooooooooooooover

进程已结束,退出代码0

 

 屏障 Barrier

wait(timeout=None) — 阻塞并尝试通过障碍,如果等待的线程数量大于或者等于线程障碍数量parties,则表示障碍通过,执行action 对应函数并执行线程内部代码,反之则继续等待;

如果wait(timeout=None) 等待超时,障碍将进入断开状态!如果在线程等待期间障碍断开或重置,此方法会引发BrokenBarrierError错误,注意添加异常处理

reset() — 重置线程障碍数量,返回默认的空状态,即当前阻塞的线程重新来过,如果在线程等待期间障碍断开或重置,此方法会引发BrokenBarrierError错误,注意添加异常处理

Barrier - wait、action

import os
import time
from multiprocessing import Process, Barrier
from threading import BrokenBarrierError


def pass_func():
    print("3 Process passed")


def func(num, barrier):
    print("I'm [{}], my pid: [{}]".format(num, os.getpid()))
    try:
        barrier.wait()
    except BrokenBarrierError as e:
        pass
    else:
        print("[{}] go ahead".format(num))


if __name__ == '__main__':
    barrier = Barrier(parties=3, action=pass_func, timeout=0.5)
    processes = []
    for i in range(10):
        p = Process(target=func, args=(i, barrier))
        p.start()
        processes.append(p)

    for i in processes:
        i.join()

凑足了3个进城则通过

 

import os
import random
import time
from multiprocessing import Process, Barrier
from threading import BrokenBarrierError


def pass_func():
    print("3 Process passed")


def func(num, barrier):
    try:
        bid = barrier.wait(10)
        time.sleep(random.randrange(5))
        print("I'm [{}], my pid: [{}]".format(num, os.getpid()))
        print("[{}] 已经凑足了 [{}]个了".format(num, bid))
        if bid == 2:
            barrier.abort()
    except BrokenBarrierError as e:
        pass
    else:
        print("[{}] go ahead".format(num))


if __name__ == '__main__':
    barrier = Barrier(parties=3, action=pass_func, timeout=0.5)
    processes = []
    for i in range(10):
        p = Process(target=func, args=(i, barrier))
        p.start()
        processes.append(p)

    for i in processes:
        i.join()

当凑了两个的时候把被abort了,永远凑不到3个有,因为abort后续的都会抛出异常

reset就是在abort之后恢复秩序从0开始重新凑数

barrier在多进程例子中貌似无效(待确定是否真的无效), 暂不理解barrier

 

 

    六、进程锁(进程同步)

 

进程锁是指Lock,在多个进程编辑同一个数据的时候,就容易造成进程间的信息不协调导致结果出错,都是为了保证前一个进程的完成执行再进入到下一个进程

 

import time
from multiprocessing import Process, Value, Lock


def sub(num):
    time.sleep(1)
    num.value += 1


if __name__ == '__main__':
    lock = Lock()  # 创建锁对象
    val = Value('i', 0)  # Value是通过共享内存的方式共享数据 初始值 val 为0

    processes = []
    for i in range(100):
        p = Process(target=sub, args=(val,))
        processes.append(p)
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print('经过100次累加之后 val 的值是: %d' % val.value)

    执行结果

经过100次累加之后 val 的值是: 69

进程已结束,退出代码0

 

每次的执行结果还不一样,所以就是数据进程间的资源争抢又覆盖导致的错误

所以这个时候需要用到锁来保持数据的有序读写,又称互斥锁

import time
from multiprocessing import Process, Value, Lock


def sub(num, lock):
   time.sleep(1) lock.acquire()
# 申请锁,lock对象变为locked,并且阻塞其他进程获取lock对象 print("sleep 1s") num.value += 1 lock.release() # 释放锁,lock对象变为unlocked,其他进程可以重新获取lock对象 if __name__ == '__main__': lock = Lock() # 创建锁对象 val = Value('i', 0) # Value是通过共享内存的方式共享数据 初始值 val 为0 processes = [] for i in range(100): p = Process(target=sub, args=(val,lock)) processes.append(p) for p in processes: p.start() for p in processes: p.join() print('经过100次累加之后 val 的值是: %d' % val.value)

    执行结果

......
sleep 1s
sleep 1s
sleep 1s
sleep 1s
sleep 1s
经过100次累加之后 val 的值是: 100

需要注意的是,如果加了锁,在锁范围内有强制等待的话,每循环一次就睡眠1s,会所有时间都叠加在一起,则需要100s,所以把睡眠移动到锁范围外,可以缩减等待时间,但是该打印的一样不会少

进程所除了lock.acquire()获得锁, lock.release()释放锁之外,还可以以上下文管理器的方式进行上锁解锁,比如:

 

def add_num():
    global count
    with lock:  # 获得锁,并返回True
        tmp = count
        time.sleep(0.001)
        count = tmp + 1

 

进程锁在上锁之后需要relase才能让其他进程获取资源,上锁之后再重复获取锁则会造成死锁,因为还没有释放锁又重新获取则获取不到锁,被block住

import os
import time
from multiprocessing import Process, Lock

def add_num(num, lock):
    lock.acquire()  # 获取锁之后没有释放
    print("I'm [{}], my pid: [{}]".format(num, os.getpid()))
    print("first time get lock")
    time.sleep(3)
    print("second time get lock")
    lock.acquire()  # 已经获取过一次仍然没有释放的情况下再次获取被block住了
    # release
    lock.release()
    lock.release()



if __name__ == '__main__':
    processed = []
    lock = Lock()

    for i in range(5):
        p = Process(target=add_num, args=(i, lock))
        p.start()
        processed.append(p)

    for i in processed:
        i.join()

    print("over")

    执行结果(注意,非正常退出,程序执行失败,被block住)

I'm [0], my pid: [41684]
first time get lock
second time get lock
Process Process-3:
Process Process-1:
Process Process-5:
Process Process-4:
Process Process-2:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  .........

进程已结束,退出代码-1073741510 (0xC000013A: interrupted by Ctrl+C)

这个时候可以使用重入锁,重入锁和互斥锁是一样的用法,只需要将Lock改成RLock即可解决该问题,需要注意的是,上了多少次锁就需要释放多少次锁

为了支持在同一线程中多次请求同一资源,python提供了“可重入锁”:threading.RLock。RLock内部维护着一个Lock和一个counter变量,counter记录了acquire的次数,从而使得资源可以被多次require。直到一个线程所有的acquire都被release,其他的线程才能获得资源。

import os
import time
from multiprocessing import Process, RLock

def add_num(num, lock):
    print("I'm [{}], my pid: [{}]".format(num, os.getpid()))
    lock.acquire()
    print("first time get lock")
    time.sleep(3)
    print("second time get lock")
    lock.acquire()
    # release
    lock.release()
    lock.release()



if __name__ == '__main__':
    processed = []
    lock = RLock()

    for i in range(5):
        p = Process(target=add_num, args=(i, lock))
        p.start()
        processed.append(p)

    for i in processed:
        i.join()

    print("over")

    执行结果

I'm [0], my pid: [17376]
first time get lock
I'm [1], my pid: [42208]
I'm [2], my pid: [39372]
I'm [3], my pid: [36724]
I'm [4], my pid: [41160]
second time get lock
first time get lock
second time get lock
first time get lock
second time get lock
first time get lock
second time get lock
first time get lock
second time get lock
over

进程已结束,退出代码0

只是将互斥锁改成了重入锁便解决了该问题

 

互斥锁在还没释放的时候互相调用锁也会造成死锁的

import os
import time
from multiprocessing import Process, Lock

def apple(name, lock_1, lock_2):
    print("I'm [{}], my pid: [{}]".format(name, os.getpid()))
    lock_1.acquire()  # 获取苹果锁
    time.sleep(2)
    print("I wanna get banana")
    lock_2.acquire()  # 没等到香蕉所释放就去获取香蕉锁,被block住了
    
    lock_1.release()
    lock_2.release()
    print("apple got banana")


def banana(name, lock_1, lock_2):
    print("I'm [{}], my pid: [{}]".format(name, os.getpid()))
    lock_2.acquire()  # 获取香蕉锁
    time.sleep(1)
    print("I wanna get apple")
    lock_1.acquire()  # 没等到苹果锁释放就去获取苹果锁,被block住了

    lock_1.release()
    lock_2.release()
    print("banana got apple")


if __name__ == '__main__':
    processed = []
    lock_apple = Lock()
    lock_banana = Lock()

    p = Process(target=apple, args=("apple", lock_apple, lock_banana))
    p.start()
    processed.append(p)
    p2 = Process(target=banana, args=("banana", lock_apple, lock_banana))
    p2.start()
    processed.append(p2)

    for i in processed:
        i.join()

    print("over")

    执行结果(注意,非正常退出,被block注了)

I'm [apple], my pid: [41640]
I'm [banana], my pid: [43820]
I wanna get apple
I wanna get banana
Process Process-2:
Process Process-1:
Traceback (most recent call last):
        ......
KeyboardInterrupt

进程已结束,退出代码-1073741510 (0xC000013A: interrupted by Ctrl+C)

两把锁谁也不让着谁,形成了死锁

 

Semaphore管理一个内置的计数器

信号量 Semaphore是用来控制多进程的同一时间的并发数量的,如下

Semaphore与进程池看起来类似,但是是完全不同的概念。

import multiprocessing
import time


def worker(s, i):
    s.acquire()  # 获得锁
    print(time.strftime('%H:%M:%S'), multiprocessing.current_process().name + " 获得锁运行");
    time.sleep(i)
    print(time.strftime('%H:%M:%S'), multiprocessing.current_process().name + " 释放锁结束");
    s.release()  # 释放锁


if __name__ == "__main__":
    s = multiprocessing.Semaphore(2)
    for i in range(6):
        p = multiprocessing.Process(target=worker, args=(s, 2))
        p.start()

    执行结果

22:04:39 Process-1 获得锁运行
22:04:39 Process-2 获得锁运行
22:04:41 Process-1 释放锁结束
22:04:41 Process-3 获得锁运行
22:04:41 Process-2 释放锁结束
22:04:41 Process-4 获得锁运行
22:04:43 Process-3 释放锁结束
22:04:43 Process-5 获得锁运行
22:04:43 Process-4 释放锁结束
22:04:43 Process-6 获得锁运行
22:04:45 Process-5 释放锁结束
22:04:45 Process-6 释放锁结束

进程已结束,退出代码0

与Semaphor相似的还有一个BoundedSemaphore

BoundedSemaphore是一个工厂函数,它返回一个新的BoundedSemaphore(有限制的信号量)对象。

一个有限制的信号量负责检查它的当前值没有超过它的初始值,一旦超过了,就会报ValueError异常。大部分情况下,信号量主要用来保护有限的资源。

如果一个信号量被release太多,这可能引起BUG。如果没有指定初始值,那么初始值为1。

和普通的信号量一样,有限的信号量内部维护一个计数器,该计数器=initialValue+release-acquire。当 计数器的值为0时,acquire方法调用会被阻塞。计数器初始值为1。

 

 

 

concurrent 中的future中的wait方法,作用于join类似,会等待所有进程结束,但是拥有不一样的功能

wait里面有一个return_when参数,有三个选择:

            FIRST_COMPLETED - Return when any future finishes or is
                              cancelled.
            FIRST_EXCEPTION - Return when any future finishes by raising an
                              exception. If no future raises an exception
                              then it is equivalent to ALL_COMPLETED.
            ALL_COMPLETED -   Return when all futures finish or are cancelled.

默认是ALL_COMPLETED,就是当所有的进城结束的时候,wait才会结束,并且返回一个元素,一个的“done”, 一个是“pending”,done表示已完成的进城有哪些,pending表示还没完成的进程有哪些 

如下例子为wait的参数为FIRST_COMPLETED 的情况,第一个成功就返回:

 

import os
import time
from concurrent.futures import ProcessPoolExecutor, wait


def func(num):
    print("I'm [{}], my pid: [{}], start time at: [{}]".format(num, os.getpid(), time.ctime()))
    time.sleep(1)
    return "I'm [{}], my pid: [{}], stop time at: [{}]".format(num, os.getpid(), time.ctime())


def my_callback(future):
    print(future.result())


if __name__ == '__main__':
    pool = ProcessPoolExecutor(max_workers=3)

    processes = []
    for i in range(10):
        fur = pool.submit(func, (i,))
        fur.add_done_callback(my_callback)
        processes.append(fur)
    done, pending = wait(processes, return_when="FIRST_COMPLETED")
    print("done: ", done)
    print("pending: ", pending)

 

    执行结果

C:\Users\guohainan\AppData\Local\Programs\Python\Python310\python.exe C:\Users\guohainan\PycharmProjects\django_study_1\threadPool.py 
I'm [(0,)], my pid: [43448], start time at: [Sat Dec 24 23:08:16 2022]
I'm [(1,)], my pid: [39576], start time at: [Sat Dec 24 23:08:16 2022]
I'm [(2,)], my pid: [41056], start time at: [Sat Dec 24 23:08:16 2022]
I'm [(3,)], my pid: [43448], start time at: [Sat Dec 24 23:08:17 2022]
I'm [(0,)], my pid: [43448], stop time at: [Sat Dec 24 23:08:17 2022]
done:
{<Future at 0x2a0fd08ba30 state=finished returned str>} pending: {<Future at 0x2a0fd0ba050 state=running>, <Future at 0x2a0fd0b9a80 state=running>, <Future at 0x2a0fd0b9c90 state=running>, <Future at 0x2a0fd0ba290 state=pending>, <Future at 0x2a0fd0ba110 state=running>, <Future at 0x2a0fd0b9f30 state=running>, <Future at 0x2a0fd0ba3b0 state=pending>, <Future at 0x2a0fd0ba1d0 state=running>, <Future at 0x2a0fd0b9de0 state=running>} I'm [(4,)], my pid: [39576], start time at: [Sat Dec 24 23:08:17 2022] I'm [(5,)], my pid: [41056], start time at: [Sat Dec 24 23:08:17 2022] I'm [(1,)], my pid: [39576], stop time at: [Sat Dec 24 23:08:17 2022] I'm [(2,)], my pid: [41056], stop time at: [Sat Dec 24 23:08:17 2022] I'm [(6,)], my pid: [41056], start time at: [Sat Dec 24 23:08:18 2022] I'm [(5,)], my pid: [41056], stop time at: [Sat Dec 24 23:08:18 2022] I'm [(7,)], my pid: [39576], start time at: [Sat Dec 24 23:08:18 2022] I'm [(8,)], my pid: [43448], start time at: [Sat Dec 24 23:08:18 2022] I'm [(4,)], my pid: [39576], stop time at: [Sat Dec 24 23:08:18 2022] I'm [(3,)], my pid: [43448], stop time at: [Sat Dec 24 23:08:18 2022] I'm [(9,)], my pid: [43448], start time at: [Sat Dec 24 23:08:19 2022] I'm [(8,)], my pid: [43448], stop time at: [Sat Dec 24 23:08:19 2022] I'm [(6,)], my pid: [41056], stop time at: [Sat Dec 24 23:08:19 2022] I'm [(7,)], my pid: [39576], stop time at: [Sat Dec 24 23:08:19 2022] I'm [(9,)], my pid: [43448], stop time at: [Sat Dec 24 23:08:20 2022] 进程已结束,退出代码0

 

子进程抛出错误例子:

 

import traceback
from multiprocessing import Process, Pipe
from random import choice


class MyProcess(Process):
    def __init__(self, *args, **kwargs):
        super(MyProcess, self).__init__(*args, **kwargs)
        self._conn1, self._conn2 = Pipe()
        self._exception = None

    def run(self):
        try:
            super().run()
            self._conn2.send(None)
        except Exception as e:
            trace = traceback.format_exc()
            self._conn2.send((e, trace))

    @property
    def exception(self):
        if self._conn1.poll():  # 判断管道是有效的
            self._exception = self._conn1.recv()
        return self._exception


def will_err_fun(a, b):
    print("must be error", a, b)
    raise choice([ValueError("error1"), IOError("error2"), AttributeError("error3")])


if __name__ == '__main__':
    processes = []
    for i in range(5):
        p = MyProcess(target=will_err_fun, name="My Process name: [{}]".format(i), args=(i, i))
        p.start()
        processes.append(p)

    for i in processes:
        i.join()

    for i in processes:
        if i.exception:
            error, trace = i.exception
            print(i.name, error, trace)
            i.terminate()
        else:
            print("mission succeed")

    执行结果

must be error 0 0
must be error 1 1
must be error 2 2
must be error 3 3
must be error 4 4
My Process name: [0] error3 Traceback (most recent call last):
  File ...................
    raise choice([ValueError("error1"), IOError("error2"), AttributeError("error3")])
AttributeError: error3
       ...................
    
进程已结束,退出代码0

原理其实就是重写了多进程类Process之后,在类对象内部定义了一个属性,然后根据属性是否为空判断进程执行有我没有报错,对象会把捕捉到的所有的错误信息都存储在了进程对象的属性里了,然后在主进程中进行读取打印

 

 

 

 

Pipe([duplex])    在进程之间创建一条管道,并返回元组(conn1,conn2),其中conn1,conn2表示管道两端的连接对象,强调一点:必须在产生Process对象之前产生管道    dumplex:默认管道是双向的,如果将duplex射成False,conn1只能用于接收,conn2只能用于发送。
conn1.recv()    接收conn2.send(obj)发送的对象。如果没有消息可接收,recv方法会一直阻塞。    如果连接的另外一端已经关闭,那么recv方法会抛出EOFError。
conn1.send(obj)    通过连接发送对象。obj是与序列化兼容的任意对象
conn1.close()    关闭连接。如果conn1被垃圾回收,将自动调用此方法
conn1.fileno()    返回连接使用的整数文件描述符
conn1.poll([timeout])    如果连接上的数据可用,返回True。    timeout指定等待的最长时限。如果省略此参数,方法将立即返回结果。如果将timeout射成None,操作将无限期地等待数据到达。
conn1.recv_bytes([maxlength])    接收c.send_bytes()方法发送的一条完整的字节消息。    maxlength指定要接收的最大字节数。如果进入的消息,超过了这个最大值,将引发IOError异常,并且在连接上无法进行进一步读取。    如果连接的另外一端已经关闭,再也不存在任何数据,将引发EOFError异常。
conn.send_bytes(buffer [, offset [, size]])    通过连接发送字节数据缓冲区,buffer是支持缓冲区接口的任意对象,offset是缓冲区中的字节偏移量,而size是要发送字节数。    结果数据以单条消息的形式发出,然后调用c.recv_bytes()函数进行接收
conn1.recv_bytes_into(buffer [, offset])    接收一条完整的字节消息,并把它保存在buffer对象中,该对象支持可写入的缓冲区接口(即bytearray对象或类似的对象)。    offset指定缓冲区中放置消息处的字节位移。返回值是收到的字节数。如果消息长度大于可用的缓冲区空间,将引发BufferTooShort异常。

 
 

freeze_support():Python多进程multiprocessing在windows的Dos或者idle下运行不了会报错
打包成exe包双击之后会一直打开exe,导致内存占满,在linux下没有问题。
在Pycharm下运行也不会有问题,经过各种查阅资料,终于解决了这个bug。
只要在main入口下添加 multiprocessing.freeze_support()就可以了

if __name__ == "__main__":
    multiprocessing.freeze_support()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
posted @ 2022-12-25 15:12  HiNEM  阅读(912)  评论(0)    收藏  举报