python进程、线程、协程

一、进程

概念:就是一个程序在一个数据集上的一次动态执行过程(本质上来讲,就是运行中的程序(代指运行过程),程序不运行就不是进程) 抽象概念

组成:

　　 1、程序：我们编写的程序用来描述进程要完成哪些功能以及如何完成

　　 2、数据集：数据集则是程序在执行过程中所需要使用的资源

　　 3、进程控制块：进程控制块用来记录进程的外部特征，描述进程的执行变化过程，系统可以利用它来控制和管理进程，它是系统感知进程存在的唯一标志。

阐释:进程与进程之间都占用的是独立的内存块,它们彼此之间的数据也是独立的

优点:同时利用多个CPU,能够同时进行多个操作

缺点:耗费资源(需要重新开辟内存空间)

构造方法：

Process([group [, target [, name [, args [, kwargs]]]]])

　　group: 线程组，目前还没有实现，库引用中提示必须是None； 
　　target: 要执行的方法； 
　　name: 进程名； 
　　args/kwargs: 要传入方法的参数。

实例方法：

　is_alive()：返回进程是否在运行。

　　join([timeout])：阻塞当前上下文环境的进程程，直到调用此方法的进程终止或到达指定的timeout（可选参数）。

　　start()：进程准备就绪，等待CPU调度

　　run()：strat()调用run方法，如果实例进程时未制定传入target，这star执行t默认run()方法。

　　terminate()：不管任务是否完成，立即停止工作进程

属性：

　daemon：和线程的setDeamon功能一样

　　name：进程名字。

　　pid：进程号。

1、创建进程的方式有俩种

a.,通过调用模块的方式来创建线程

# 进程模块

from multiprocessing import Process

def work(name):

print("Hello, %s" % name)

if __name__ == "__main__":

p = Process(target=work, args=("nick",))

p.start()

p.join()

b.通过继承类的方式(推荐)

import multiprocessing


class Process(multiprocessing.Process):
    def run(self):
        sum = 0
        for n in range(100000000):
            sum += n
        print(sum)

li = []
for i in range(2):
    p = Process()
    li.append(p)

if __name__ == '__main__':
    for p in li:
        p.start()

    for i in li:
        i.join()

    print("ending")

2、进程之间的通信

创建进程模块的下队列(Queue)

# 进程之间的通信   Queue
from multiprocessing import Queue, Process, Pipe
import os,time,random


def write(q):
    print("process to write{}".format(os.getpid()))
    for value in ["A","B","C"]:
        print("Put {} to queue...".format(value))
        q.put(value)
        time.sleep(random.random())


def read(q):
    print("process to read{}".format(os.getpid()))
    while True:
        value = q.get(True)
        print("Get {} from queue".format(value))

if __name__ == '__main__':
    q = Queue()
    pw = Process(target=write,args=(q,))  # 这里传输的q是copy的
    pr = Process(target=read,args=(q,))
    pw.start()
    pr.start()

    pw.join()
    pr.terminate()  # 强行终止进程(因为这个子进程定义了一个死循环)

进程队列(Queue)

Queue

管道(Pipe)

# 进程之间的通信   Pipe(类似于socket)
from multiprocessing import Queue, Process, Pipe
import os,time,random

# 说明Pipe的send是没有返回值的
pipe = Pipe()
# print(pipe)

def worker(pipe):
    time.sleep(random.random())
    for i in range(10):
        print("worker send {}".format(pipe.send(i)))


def Boss(pipe):
    while True:
        print("Boss recv {}".format(pipe.recv()))

p1 = Process(target=worker,args=(pipe[0],))
p2 = Process(target=Boss,args=(pipe[1],))
if __name__ == '__main__':

    p1.start()
    p2.start()

管道(Pipe)

Pipe

3、进程之间的数据共享

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：

Shared memory

a.Manger

from multiprocessing import Process, Manager


def f(d,l,n):
    d["name"] = "alex"
    d[n] = "1"
    l.append(n)

if __name__ == '__main__':
    with Manager() as manager:  # 类似于文件操作的with open(...)
        d = manager.dict()
        l = manager.list(range(5))
        print(d,l)

        p_list = []
        for n in range(10):
            p = Process(target=f,args=(d, l, n))
            p.start()
            p_list.append(p)

        for p in p_list:   
            p.join()           # 这儿的join必须加

        print(d)
        print(l)

# 关于数据共享的进程等待的问题,鄙人作出一些自己的理解
# 多核CPU的情况下，进程间是可以实现并行的，当然每个核处理的速度又有极其细微的差异性，速度处理稍慢些的进程在还在对数据进行处理的候，同时又想要得到数据了，自然会出现错误，所以要等待进程处理完这份数据的时候再进行操作

进程数据共享(Manager)

Manager

from multiprocessing import Process, Manager

def func(n,a):
    n.value = 50
    for i in range(len(a)):
        a[i] += 10


if __name__ == '__main__':
    with Manager() as manager:
        num = manager.Value("d", 0.0)
        ints = manager.Array("i", range(10))
        p = Process(target=func,args=(num,ints))
        p.start()
        p.join()

        print(num)
        print(ints)

输出
Value('d', 50)
array('i', [10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

# 共享内存有两个结构，一个是 Value, 一个是 Array，这两个结构内部都实现了锁机制，因此是多进程安全的。
# Value 和 Array 都需要设置其中存放值的类型，d 是 double 类型，i 是 int 类型，具体的对应关系在Python 标准库的 sharedctypes 模块中查看。
# 上面的共享内存支持两种结构 Value 和 Array, 这些值在主进程中管理，很分散。 Python 中还有一统天下，无所不能的Manager，专门用来做数据共享。 其支持的类型非常多。

View Code

b.数据可以用Value或Array存储在一个共享内存

from multiprocessing import Process, Value, Array
  
def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]
  
if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))
  
    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()
  
    print(num.value)
    print(arr[:])
 
 
# 输出：
3.1415927
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

View Code

由Manager()返回的manager提供list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array类型的支持。

from multiprocessing import Process,Array
temp = Array('i', [11,22,33,44])
  
def Foo(i):
    temp[i] = 100+i
    for item in temp:
        print i,'----->',item
  
for i in range(2):
    p = Process(target=Foo,args=(i,))
    p.start()

总结性能：Server process manager比 shared memory 更灵活，因为它可以支持任意的对象类型。另外，一个单独的manager可以通过进程在网络上不同的计算机之间共享，不过他比shared memory要慢。

4、进程同步

a.Lock

锁是为了确保数据一致性，比如读写锁，每个进程给一个变量增加 1 ，但是如果在一个进程读取但还没有写入的时候，另外的进程也同时读取了，并写入该值，则最后写入的值是错误的，这时候就需要锁。

# 为什么引申进程同步
# 数据的一致性
import time
from multiprocessing import Lock, Process


def run(i, lock):
    with lock:  # 自动获得锁和释放锁
        time.sleep(1)
        print(i)


if __name__ == '__main__':

    lock = Lock()

    for i in range(10):
        p = Process(target=run,args=(i,lock,))
        p.start()

进程同步

View Code

Lock 同时也实现了 ContextManager API, 可以结合 with 语句使用, 关于 ContextManager, 请移步 Python 学习实践笔记装饰器与 context 查看。

b.Semaphore

Semaphore 和 Lock 稍有不同，Semaphore 相当于 N 把锁，获取其中一把就可以执行了。信号量的总数 N 在构造时传入，s = Semaphore(N)。和 Lock 一样，如果信号量为0，则进程堵塞，直到信号大于0。

5、进程池

　　如果有50个任务要去执行,CPU只有4核,那创建50个进程完成,其实大可不必，徒增管理开销。如果只想创建4个进程，让它们轮流替完成任务，不用自己去管理具体的进程的创建销毁，那 Pool 是非常有用的。

　　Pool 是进程池，进程池能够管理一定的进程，当有空闲进程时，则利用空闲进程完成任务，直到所有任务完成为止

def func(x):
    return x*x
 
if __name__ == '__main__':
    p_pool = pool.Pool(4)
    result = p_pool.map(func,range(8))
    print(result)
# Pool 进程池创建4个进程，不管有没有任务，都一直在进程池中等候，等到有数据的时候就开始执行。

从上面的例子来看貌似也看不出什么效果,那么接下来自定义一个进程池

关于进程池的API用法(并不是只有俩个哦)

- apply (每个任务是排队进行,类似于串行失去意义)
- apply_async (任务都是并发进行,并且可以设置回调函数) 进程的并发其实可以称之为并行了,可以利用到多核CPU

from multiprocessing import Poolimport time
def myFun(i):
    time.sleep(2)
    return i+100
 
def end_call(arg):
    print("end_call",arg)
 
p = Pool(5)
 
# print(p.map(myFun,range(10)))
 
for i in range(10):
    p.apply_async(func=myFun,args=(i,),callback=end_call)
 
print("end")
p.close()
p.join()

from multiprocessing import Pool, TimeoutError
import time
import os
 
def f(x):
    return x*x
 
if __name__ == '__main__':
    # 创建4个进程 
    with Pool(processes=4) as pool:
 
        # 打印 "[0, 1, 4,..., 81]" 
        print(pool.map(f, range(10)))
 
        # 使用任意顺序输出相同的数字， 
        for i in pool.imap_unordered(f, range(10)):
            print(i)
 
        # 异步执行"f(20)" 
        res = pool.apply_async(f, (20,))      # 只运行一个进程 
        print(res.get(timeout=1))             # 输出 "400" 
 
        # 异步执行 "os.getpid()" 
        res = pool.apply_async(os.getpid, ()) # 只运行一个进程 
        print(res.get(timeout=1))             # 输出进程的 PID 
 
        # 运行多个异步执行可能会使用多个进程 
        multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)]
        print([res.get(timeout=1) for res in multiple_results])
 
        # 是一个进程睡10秒 
        res = pool.apply_async(time.sleep, (10,))
        try:
            print(res.get(timeout=1))
        except TimeoutError:
            print("发现一个 multiprocessing.TimeoutError异常")
 
        print("目前，池中还有其他的工作")
 
    # 退出with块中已经停止的池 
    print("Now the pool is closed and no longer available")

官方示例

官方实例

二、线程

概念:线程是应用程序中工作的最小单元，或者又称之为微进程。

组成:它被包含在进程之中，是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流，一个进程中可以并发多个线程，每条线程并行执行不同的任务。

阐释:线程不能够独立执行，必须依存在应用程序中，由应用程序提供多个线程执行控制。线程可以共享(调用)进程的数据资源

优点:共享内存,IO操作时候,创造并发操作

缺点:"......"(中国文化的博大精深的带引号)

关于多线程

多线程类似于同时执行多个不同程序，多线程运行有如下优点：

- 使用线程可以把占据长时间的程序中的任务放到后台去处理。
- 用户界面可以更加吸引人，这样比如用户点击了一个按钮去触发某些事件的处理，可以弹出一个进度条来显示处理的进度
- 程序的运行速度可能加快
- 在一些等待的任务实现上如用户输入、文件读写和网络收发数据等，线程就比较有用了。在这种情况下我们可以释放一些珍贵的资源如内存占用等等。

线程在执行过程中与进程还是有区别的。每个独立的线程有一个程序运行的入口、顺序执行序列和程序的出口。但是线程不能够独立执行，必须依存在应用程序中，由应用程序提供多个线程执行控制。

每个线程都有他自己的一组CPU寄存器，称为线程的上下文，该上下文反映了线程上次运行该线程的CPU寄存器的状态。

指令指针和堆栈指针寄存器是线程上下文中两个最重要的寄存器，线程总是在进程得到上下文中运行的，这些地址都用于标志拥有线程的进程地址空间中的内存。

- 线程可以被抢占（中断）。
- 在其他线程正在运行时，线程可以暂时搁置（也称为睡眠） -- 这就是线程的退让。

线程可以分为:

- 内核线程：由操作系统内核创建和撤销。
- 用户线程：不需要内核支持而在用户程序中实现的线程。

Python3 线程中常用的两个模块为：

- _thread
- threading(推荐使用)

thread 模块已被废弃。用户可以使用 threading 模块代替。所以，在 Python3 中不能再使用"thread" 模块。为了兼容性，Python3 将 thread 重命名为 "_thread"。

Python中使用线程有两种方式：函数或者用类来包装线程对象。

Python3 通过两个标准库 _thread 和 threading 提供对线程的支持。

_thread 提供了低级别的、原始的线程以及一个简单的锁，它相比于 threading 模块的功能还是比较有限的。

threading 模块除了包含 _thread 模块中的所有方法外，还提供的其他方法：

- threading.currentThread(): 返回当前的线程变量。
- threading.enumerate(): 返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前，不包括启动前和终止后的线程。
- threading.activeCount(): 返回正在运行的线程数量，与len(threading.enumerate())有相同的结果。

除了使用方法外，线程模块同样提供了Thread类来处理线程，Thread类提供了以下方法:

run(): 用以表示线程活动的方法。
start():启动线程活动。 
join([time]): 等待至线程中止。这阻塞调用线程直至线程的join() 方法被调用中止-正常退出或者抛出未处理的异常-或者是可选的超时发生。
setDaemon(True):守护主线程,跟随主线程退(必须要放在start()上方)
isAlive(): 返回线程是否活动的。
getName(): 返回线程名。
setName(): 设置线程名。

1 、创建线程

a.通过调用模块的方式来创建线程(推荐使用)

import threading # 线程模块
import time
# 创建线程
def onepiece1(n):
    print("路飞正在使用橡胶火箭炮%s,攻击力%s" %(time.ctime(),n))
    time.sleep(3)
    print("路飞结束该技能%s" %time.ctime())

def onepiece2(n):
    print("艾尼路正在出雷神万击%s你,攻击力%s" %(time.ctime(),n))
    time.sleep(5)
    print("艾尼路结束该技能%s" %time.ctime())

if __name__ == '__main__':

    thread_1 = threading.Thread(target=onepiece1,args=(10,)) # 创建子线程
    thread_2 = threading.Thread(target=onepiece2,args=(9,))

    thread_1.start()
    # pyhton1.join()
    thread_2.start()
    thread_2.join() # 等待线程终止

    print("ending Fighting")

b.创建类通过继承的方式来创建线程

使用Threading模块创建线程，直接从threading.Thread继承，然后重写__init__方法和run方法：

import threading
import time

class MyThread(threading.Thread):
    def __init__(self,num):
        threading.Thread.__init__(self)
        self.num = num

    def run(self):  # 定义每个线程要运行的函数
        print("running on number:%s" %self.num)
        time.sleep(3)
print("ending......")

if __name__ == '__main__':
    t1 = MyThread(1) # 继承这个类，把1这个参数，传给num ,t1就是个线程对象
    t2 = MyThread(2)
    t1.start()
    t2.start()

2、GIL

在知道线程的创建方式以及一些方法的使用后,引申一个cpython解释器的一个历史遗留问题,全局GIL锁

因为Python的线程虽然是真正的线程，但解释器执行代码时，有一个GIL锁：Global Interpreter Lock，任何Python线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，所以，多线程在Python中只能交替执行，即使100个线程跑在100核CPU上，也只能用到1个核。

当然了,也有通过别的途径提高执行效率,技术的道路上终无止境。

3、线程锁

多个线程共同对某个数据修改，则可能出现不可预料的结果，为了保证数据的正确性，需要对多个线程进行同步。

使用 Thread 对象的 Lock 和 Rlock 可以实现简单的线程同步。

这两个对象都有 acquire 方法和 release 方法。

对于那些需要每次只允许一个线程操作的数据，可以将其操作放到 acquire 和 release 方法之间。

import threading
import time
 
num = 0
 
lock = threading.RLock()    # 实例化锁类
 
def work():
    lock.acquire()  # 加锁
    global num
    num += 1
    time.sleep(1)
    print(num)
    lock.release()  # 解锁
 
for i in range(10):
    t = threading.Thread(target=work)
    t.start()

View Code

a.线程的死锁和递归锁

在线程间共享多个资源的时候，如果两个线程分别占有一部分资源并且同时等待对方的资源，就会造成死锁，因为系统判断这部分资源都

正在使用，所有这两个线程在无外力作用下将一直等待下去。

解决死锁就可以用递归锁

为了支持在同一线程中多次请求同一资源，python提供了“可重入锁”：threading.RLock。RLock内部维护着一个Lock和一个counter变量，counter记录了acquire的次数，从而使得资源可以被多次acquire。直到一个线程所有的acquire都被release，其他的线程才能获得资源。

import threading,time

# lock_A = threading.Lock()
# lock_B = threading.Lock()
r_lock = threading.RLock()


class Mythread(threading.Thread):

    def actionA(self):
        r_lock.acquire()
        print(self.name,time.ctime())
        time.sleep(2)
        r_lock.acquire()
        print(self.name,time.ctime())
        time.sleep(1)
        r_lock.release()
        r_lock.release()

    def actionB(self):
        r_lock.acquire()
        print(self.name,time.ctime())
        time.sleep(2)
        r_lock.acquire()
        print(self.name,time.ctime())
        time.sleep(1)
        r_lock.release()
        r_lock.release()

    def run(self):

        self.actionA()
        self.actionB()
li = []
for i in range(5):
    t = Mythread()
    t.start()
    li.append(t)

for t in li:
    t.join()

print("ending")

递归锁

View Code

c.信号量(Semaphore):从意义上来讲,也可以称之为一种锁

信号量：指同时开几个线程并发

　　信号量用来控制线程并发数的，BoundedSemaphore或Semaphore管理一个内置的计数器，每当调用acquire()时-1，调用release()时+1。

计数器不能小于0，当计数器为 0时，acquire()将阻塞线程至同步锁定状态，直到其他线程调用release()。(类似于停车位的概念)

　　 BoundedSemaphore与Semaphore的唯一区别在于前者将在调用release()时检查计数器的值是否超过了计数器的初始值，如果超过了将抛出一个异常。

import threading,time

class myThread(threading.Thread):
    def run(self):           #启动后，执行run方法
        if semaphore.acquire():  #加把锁，可以放进去多个（相当于5把锁，5个钥匙，同时有5个线程）
            print(self.name)
            time.sleep(5)
            semaphore.release()

if __name__=="__main__":
    semaphore=threading.Semaphore(5)  #同时能有几个线程进去（设置为5就是一次5个线程进去），类似于停车厂一次能停几辆车
    
    thrs=[] #空列表
    for i in range(100): #100个线程
        thrs.append(myThread()) #加线程对象

    for t in thrs:
        t.start()  #分别启动

信号量例子

View Code

d.同步条件(Event)

简单了解

Event对象实现了简单的线程通信机制，它提供了设置信号，清楚信号，等待等用于实现线程间的通信。

1 设置信号

使用Event的set()方法可以设置Event对象内部的信号标志为真。Event对象提供了isSet()方法来判断其内部信号标志的状态。当使用event对象的set（）方法后，isSet（）方法返回真

2 清除信号

使用Event对象的clear()方法可以清除Event对象内部的信号标志，即将其设为假，当使用Event的clear方法后，isSet()方法返回假

3 等待

Event对象wait的方法只有在内部信号为真的时候才会很快的执行并完成返回。当Event对象的内部信号标志位假时，则wait方法一直等待到其为真时才返回。

import threading, time


class Boss(threading.Thread):
    def run(self):
        print("BOSS：今晚大家都要加班到22:00。")
        print(event.isSet())
        event.set()
        time.sleep(5)
        print("BOSS：<22:00>可以下班了。")
        print(event.isSet())
        event.set()


class Worker(threading.Thread):
    def run(self):
        event.wait()
        print("Worker：哎……命苦啊！")
        time.sleep(1)
        event.clear()
        event.wait()
        print("Worker：OhYeah!")


if __name__ == "__main__":
    event = threading.Event()
    threads = []
    for i in range(5):
        threads.append(Worker())
    threads.append(Boss())
    for t in threads:
        t.start()
    for t in threads:
        t.join()

同步条件Event

View Code

Event内部包含了一个标志位，初始的时候为false。
可以使用使用set()来将其设置为true；
或者使用clear()将其从新设置为false；
可以使用is_set()来检查标志位的状态；
另一个最重要的函数就是wait(timeout=None)，用来阻塞当前线程，直到event的内部标志位被设置为true或者timeout超时。如果内部标志位为true则wait()函数理解返回。

4.多线程利器——队列(queue)

因为列表是不安全的数据结构,所以引申了新的模块——队列

Python 的 queue 模块中提供了同步的、线程安全的队列类，包括FIFO（先入先出)队列Queue，LIFO（后入先出）队列LifoQueue，和优先级队列 PriorityQueue。

这些队列都实现了锁原语，能够在多线程中直接使用，可以使用队列来实现线程间的同步。

queue 模块中的常用方法:

- q = queue.Queue(maxsize=0) # 构造一个先进显出队列，maxsize指定队列长度，为0 时，表示队列长度无限制。
- q.join() 　　# 等到队列为kong的时候，在执行别的操作
- q.qsize() 　 # 返回队列的大小（不可靠）
- q.empty() # 当队列为空的时候，返回True 否则返回False （不可靠）
- q.full() # 当队列满的时候，返回True，否则返回False （不可靠）
- q.put(item, block=True, timeout=None) # 将item放入Queue尾部，item必须存在，可以参数block默认为True,表示当队列满时，会等待队列给出可用位置，为False时为非阻塞，此时如果队列已满，会引发queue.Full 异常。可选参数timeout，表示会阻塞设置的时间，过后，如果队列无法给出放入item的位置，则引发 queue.Full 异常
- q.get(block=True, timeout=None) # 移除并返回队列头部的一个值，可选参数block默认为True，表示获取值的时候，如果队列为空，则阻塞，为False时，不阻塞，若此时队列为空，则引发 queue.Empty异常。可选参数timeout，表示会阻塞设置的时候，过后，如果队列为空，则引发Empty异常。
- q.put_nowait(item) # 等效于 put(item,block=False)
- q.get_nowait() # 等效于 get(item,block=False)

import queue

# 队列有三种模式
# 先进先出
qu = queue.Queue()

qu.put("alex")
qu.put(123)
qu.put({"age":18})

while True:
    print(qu.get())
    print("————————")

FIFO

先进先出

qu = queue.LifoQueue()

qu.put("alex")
qu.put(123)
qu.put({"age":18})

while True:
    print(qu.get())
    print("————————")

# 先进后出

q = queue.PriorityQueue(3)  # 设定大小

q.put([1, "alex"])
q.put([3, 123])
q.put([2, {"age":18}])
# q.put([4,456])  # 如果装的大于设定大小,也会阻塞(等待)

# while True:
#     print(q.get()[1])  # get当取不到值之后会等待
#     print("————————")

print(q.qsize())  # 查看当前队列有多少个
print(q.empty())  # 判断是否为空
print(q.full())   # 判断是否为满

优先级限定大小

# 实例
import queue
import threading
import time
 
go = False  # 设定标识位
 
 
class MyThread(threading.Thread):
    def __init__(self, threadID, name, q):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.q = q
 
    def run(self):
        print("开启线程:{}".format(self.name))
        process_data(self.name,self.q)
        print("退出线程:{}".format(self.name))
 
 
def process_data(thread_name,q):
    while not go:
        queue_lock.acquire()        # 获得锁
        if not work_queue.empty():  # 如果队列为空返回True,反之False
            data = q.get()          # 向队列取值,先进先出
            queue_lock.release()    # 释放锁
            print("{} processing {}".format(thread_name,data))
        else:
            queue_lock.release()
        time.sleep(1)
 
thread_list = ["Thread-1", "Thread-2", "Thread-3"]
name_list = ["one", "two", "three", "four", "five"]
queue_lock = threading.Lock()  # 同步锁
 
work_queue = queue.Queue(10)
threads = []
threads_ID = 1
 
# 创建新线程
for t in thread_list:
    thread = MyThread(threads_ID,t,work_queue)  # 创建线程
    thread.start()          # 启动线程
    threads.append(thread)  # 追加线程对象到列表
    threads_ID += 1         # ID自加1
 
# 填充队列
queue_lock.acquire()
for name in name_list:
    work_queue.put(name)  # 向队列填充
queue_lock.release()
 
# 等待队列清空.  清空返回True,则此循环会跳过
while not work_queue.empty():
    pass
 
# 改变状态,通知线程退出
go = True
 
# 等待所有线程完成
for t in threads:
    t.join()
print("退出主线程。")

5.生产者与消费者模型

import queue
import threading
 
que = queue.Queue(10)
 
def s(i):
    que.put(i)
    # print("size:", que.qsize())
 
def x(i):
    g = que.get(i)
    print("get:", g)
 
for i in range(1, 13):
    t = threading.Thread(target=s, args=(i,))
    t.start()
 
for i in range(1, 11):
    t = threading.Thread(target=x, args=(i,))
    t.start()
     
print("size:", que.qsize())
 
# 输出结果：
get: 1
get: 2
get: 3
get: 4
get: 5
get: 6
get: 7
get: 8
get: 9
get: 10
size: 2

生产者消费者模型

在这个现实社会中，生活中处处充满了生产和消费.

什么是生产者消费者模型

在工作中，可能会碰到这样一种情况：某个模块负责产生数据，这些数据由另一个模块来负责处理（此处的模块是广义的，可以是类、函数、线程、进程等）。产生数据的模块，就形象地称为生产者；而处理数据的模块，就称为消费者。在生产者与消费者之间在加个缓冲区，形象的称之为仓库，生产者负责往仓库了进商品，而消费者负责从仓库里拿商品，这就构成了生产者消费者模型。结构图如下

生产者消费者模型的优点

a、接耦

假设生产者和消费者分别是两个类。如果让生产者直接调用消费者的某个方法，那么生产者对于消费者就会产生依赖（也就是耦合）。将来如果消费者的代码发生变化，可能会影响到生产者。而如果两者都依赖于某个缓冲区，两者之间不直接依赖，耦合也就相应降低了。

举个例子，我们去邮局投递信件，如果不使用邮筒（也就是缓冲区），你必须得把信直接交给邮递员。有同学会说，直接给邮递员不是挺简单的嘛？其实不简单，你必须得认识谁是邮递员，才能把信给他（光凭身上穿的制服，万一有人假冒，就惨了）。这就产生和你和邮递员之间的依赖（相当于生产者和消费者的强耦合）。万一哪天邮递员换人了，你还要重新认识一下（相当于消费者变化导致修改生产者代码）。而邮筒相对来说比较固定，你依赖它的成本就比较低（相当于和缓冲区之间的弱耦合）。

2、支持并发

由于生产者与消费者是两个独立的并发体，他们之间是用缓冲区作为桥梁连接，生产者只需要往缓冲区里丢数据，就可以继续生产下一个数据，而消费者只需要从缓冲区了拿数据即可，这样就不会因为彼此的处理速度而发生阻塞。

接上面的例子，如果我们不使用邮筒，我们就得在邮局等邮递员，直到他回来，我们把信件交给他，这期间我们啥事儿都不能干（也就是生产者阻塞），或者邮递员得挨家挨户问，谁要寄信（相当于消费者轮询）。

3、支持产出消耗均衡

缓冲区还有另一个好处。如果制造数据的速度时快时慢，缓冲区的好处就体现出来了。当数据制造快的时候，消费者来不及处理，未处理的数据可以暂时存在缓冲区中。等生产者的制造速度慢下来，消费者再慢慢处理掉。

为了充分复用，再拿寄信的例子来说事。假设邮递员一次只能带走1000封信。万一某次碰上情人节（也可能是圣诞节）送贺卡，需要寄出去的信超过1000封，这时候邮筒这个缓冲区就派上用场了。邮递员把来不及带走的信暂存在邮筒中，等下次过来时再拿走。

对生产者与消费者模型的阐释就进行到这里,用代码实现生产者与消费者模型

import time,random
import queue,threading

q = queue.Queue()

def Producer(name):
  count = 0
  while count <10:
    print("making.....正在制作包子...")
    time.sleep(5)
    q.put(count)
    print('Producer %s has produced %s baozi..' %(name, count))
    count +=1
    q.join()
    print("ok......")

def Consumer(name):
  count = 0
  while count <10:
        time.sleep(random.randrange(4))  # 产生一个随机数（1秒-3秒之间）
        data = q.get()
        print("eating.......")
        time.sleep(4)  # 4秒钟这后
        q.task_done()  # 给他发一个信号,才打印ok
        print('\033[32;1mConsumer %s has eat %s baozi...\033[0m' %(name, data))
        count +=1

p1 = threading.Thread(target=Producer, args=('A君',))
c1 = threading.Thread(target=Consumer, args=('B君',))
c2 = threading.Thread(target=Consumer, args=('C君',))
c3 = threading.Thread(target=Consumer, args=('D君',))

p1.start()
c1.start()
c2.start()
c3.start()

包子工厂

包子铺的生产消耗

6.线程池

# 自定义线程池（一）
import queue
import threading
import time

class TreadPool:

    def __init__(self, max_num=20):
        self.queue = queue.Queue(max_num)
        for i in range(max_num):
            self.queue.put(threading.Thread)

    def get_thread(self):
        return self.queue.get()

    def add_thread(self):
        self.queue.put(threading.Thread)

def func(pool, n):
    time.sleep(1)
    print(n)
    pool.add_thread()

p = TreadPool(10)
for i in range(1, 100):
    thread = p.get_thread()
    t = thread(target=func, args=(p, i,))
    t.start()

自定义线程池（一）

自定制线程1 queue+threading

# 线程池（二）
import queue
import threading
import contextlib
import time

StopEvent = object()

class Threadpool:

    def __init__(self, max_num=10):
        self.q = queue.Queue()
        self.max_num = max_num

        self.terminal = False
        self.generate_list = []     # 以创建线程列表
        self.free_list = []         # 以创建的线程空闲列表

    def run(self, func, args, callback=None):
        """
        线程池执行一个任务
        :param func: 任务函数
        :param args: 任务函数所需参数
        :param callback: 任务执行失败或成功后执行的回调函数，回调函数有两个参数1、任务函数执行状态；2、任务函数返回值（默认为None，即：不执行回调函数）
        :return: 如果线程池已经终止，则返回True否则None
        """
        if len(self.free_list) == 0 and len(self.generate_list) < self.max_num:
            self.generate_thread()
        w = (func, args, callback,)
        self.q.put(w)

    def generate_thread(self):
        """
        创建一个线程
        """
        t = threading.Thread(target=self.call)
        t.start()

    def call(self):
        """
        循环去获取任务函数并执行任务函数
        """
        current_thread = threading.currentThread    # 当前线程
        self.generate_list.append(current_thread)

        event = self.q.get()
        while event != StopEvent:

            func, arguments, callback = event
            try:
                result = func(*arguments)
                status = True
            except Exception as e:
                status = False
                result = e

            if callback is not None:
                try:
                    callback(status, result)
                except Exception as e:
                    pass

            if self.terminal:
                event = StopEvent
            else:
                with self.worker_state(self.free_list, current_thread):
                    event = self.q.get()
                # self.free_list.append(current_thread)
                # event = self.q.get()
                # self.free_list.remove(current_thread)

        else:
            self.generate_list.remove(current_thread)

    def close(self):
        """
        执行完所有的任务后，所有线程停止
        """
        num = len(self.generate_list)
        while num:
            self.q.put(StopEvent)
            num -= 1

    def terminate(self):
        """
        无论是否还有任务，终止线程
        """
        self.terminal = True
        while self.generate_list:
            self.q.put(StopEvent)
        self.q.empty()  # 清空队列

    @contextlib.contextmanager      # with上下文管理
    def worker_state(self, frelist, val):
        """
        用于记录线程中正在等待的线程数
        """
        frelist.append(val)
        try:
            yield
        finally:
            frelist.remove(val)


def work(i):
    time.sleep(1)
    print(i)

pool = Threadpool()
for item in range(50):
    pool.run(func=work, args=(item,))
pool.close()
# pool.terminate()

自定义线程池（二）

自定制线程2

三、协程

1.简介：

　　　　协程又叫微线程，从技术的角度来说，“协程就是你可以暂停执行的函数”。如果你把它理解成“就像生成器一样”，那么你就想对了。线程和进程的操作是由程序触发系统接口，最后的执行者是系统；协程的操作则是程序员。

　　协程存在的意义：对于多线程应用，CPU通过切片的方式来切换线程间的执行，线程切换时需要耗时（保存状态，下次继续）。协程，则只使用一个线程，在一个线程中规定某个代码块执行顺序。

　　协程的适用场景：当程序中存在大量不需要CPU的操作时（IO），适用于协程

优点：

- 优点1: 协程极高的执行效率。因为子程序切换不是线程切换，而是由程序自身控制，因此，没有线程切换的开销，和多线程比，线程数量越多，协程的性能优势就越明显。
- 优点2: 不需要多线程的锁机制，因为只有一个线程，也不存在同时写变量冲突，在协程中控制共享资源不加锁，只需要判断状态就好了，所以执行效率比多线程高很多。

因为协程是一个线程执行，那怎么利用多核CPU呢？最简单的方法是多进程+协程，既充分利用多核，又充分发挥协程的高效率，可获得极高的性能。

在此引申了下生成器的内容

2.安装

# 安装
pip install gevent
 
# 导入模块
import gevent

import time
import queue

def consumer(name):
    print("--->ready to eat baozi........")
    while True:
        new_baozi = yield  # yield实现上下文切换，传包子进来
        print("[%s] is eating baozi %s" % (name,new_baozi))
        #time.sleep(1)

def producer():

    r = con.__next__()
    r = con2.__next__()
    n = 0
    while 1:
        time.sleep(1)
        print("\033[32;1m[producer]\033[0m is making baozi %s and %s" %(n,n+1) )
        con.send(n)  # 发送告诉他有包子了
        con2.send(n+1)

        n +=2

if __name__ == '__main__':
    con = consumer("c1")
    con2 = consumer("c2")
    producer()

yield简单实现

yield 简单实现切换

# greenlet
from greenlet import greenlet
 
def test1():
    print(11)
    gr2.switch()
    print(22)
    gr2.switch()
 
def test2():
    print(33)
    gr1.switch()
    print(44)
 
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
 
# 输出结果：
11
33
22
44

元生语句 greenlet

　　3.gevent

# gevent
import gevent
 
def foo():
    print("Running in foo")
    gevent.sleep(0)
    print("Explicit context switch to foo angin")
 
def bar():
    print("Explicit context to bar")
    gevent.sleep(0)
    print("Implicit context swich back to bar")
 
gevent.joinall([
    gevent.spawn(foo),
    gevent.spawn(bar),
])
 
# 输出结果：
Running in foo
Explicit context to bar
Explicit context switch to foo angin
Implicit context swich back to bar

View Code

import gevent
import requests,time

start_time = time.time()


def get_url(url):
    print("get: {}".format(url))
    resp = requests.get(url)
    data = resp.text
    print(len(data),url)

# get_url('https://www.python.org/')
# get_url('https://www.yahoo.com/')
# get_url('https://www.baidu.com/')
# get_url('https://www.sina.com.cn/')
# get_url('http://www.xiaohuar.com/')

gevent.joinall(
    [
        gevent.spawn(get_url, 'https://www.python.org/'),
        gevent.spawn(get_url, 'https://www.yahoo.com/'),
        gevent.spawn(get_url, 'https://www.baidu.com/'),
        gevent.spawn(get_url, 'https://www.sina.com.cn/'),
        gevent.spawn(get_url,'http://www.xiaohuar.com/')
    ]
)


print(time.time()-start_time)

gevent 爬虫

# 遇到IO自动切换
from gevent import monkey
monkey.patch_all()
import gevent
import requests

def f(url):
    print("FET: %s" % url)
    resp = requests.get(url)
    data = len(resp.text)
    print(url, data)

gevent.joinall([
    gevent.spawn(f, 'https://www.python.org/'),
    gevent.spawn(f, 'https://www.yahoo.com/'),
    gevent.spawn(f, 'https://github.com/'),
])

遇到IO操作自动切换

遇到IO操作自动切换

4.协程总结：

1. 没有切换的消耗
2. 没有锁的概念

posted @ 2017-06-01 19:16 红领巾下的大刀疤阅读(450) 评论(0) 收藏举报

刷新页面返回顶部

红领巾下的大刀疤

Only you become stronger, to protect the people I want to protect...

python进程、线程、协程

一、进程

1、创建进程的方式有俩种

2、进程之间的通信

3、进程之间的数据共享

4、进程同步

5、进程池

二、线程

1 、创建线程

2、GIL

3、线程锁

4.多线程利器——队列(queue)

5.生产者与消费者模型

6.线程池

三、协程

1.简介：

2.安装

3.gevent

4.协程总结：

公告