Python之路[9] - 进程、线程、携程篇 - 迁

【转】

进程与线程

什么是进程(process)？

An executing instance of a program is called a process.

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

程序并不能单独运行，只有将程序装载到内存中，系统为它分配资源才能运行，而这种执行的程序就称之为进程。程序和进程的区别就在于：程序是指令的集合，它是进程运行的静态描述文本；进程是程序的一次执行活动，属于动态概念。

在多道编程中，我们允许多个程序同时加载到内存中，在操作系统的调度下，可以实现并发地执行。这是这样的设计，大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU，因此，进程就是为了在CPU上实现多道编程而提出的

有了进程为什么还要线程？

进程有很多优点，它提供了多道编程，让我们感觉我们每个人都拥有自己的CPU和其他资源，可以提高计算机的利用率。很多人就不理解了，既然进程这么优秀，为什么还要线程呢？其实，仔细观察就会发现进程还是有很多缺陷的，主要体现在两点上：

进程只能在一个时间干一件事，如果想同时干两件事或多件事，进程就无能为力了。
进程在执行的过程中如果阻塞，例如等待输入，整个进程就会挂起，即使进程中有些工作不依赖于输入的数据，也将无法执行。

例如，我们在使用qq聊天， qq做为一个独立进程如果同一时间只能干一件事，那他如何实现在同一时刻即能监听键盘输入、又能监听其它人给你发的消息、同时还能把别人发的消息显示在屏幕上呢？你会说，操作系统不是有分时么？但我的亲，分时是指在不同进程间的分时呀，即操作系统处理一会你的qq任务，又切换到word文档任务上了，每个cpu时间片分给你的qq程序时，你的qq还是只能同时干一件事呀。

再直白一点，一个操作系统就像是一个工厂，工厂里面有很多个生产车间，不同的车间生产不同的产品，每个车间就相当于一个进程，且你的工厂又穷，供电不足，同一时间只能给一个车间供电，为了能让所有车间都能同时生产，你的工厂的电工只能给不同的车间分时供电，但是轮到你的qq车间时，发现只有一个干活的工人，结果生产效率极低，为了解决这个问题，应该怎么办呢？。。。。没错，你肯定想到了，就是多加几个工人，让几个人工人并行工作，这每个工人，就是线程！

什么是线程(thread)？

线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流，一个进程中可以并发多个线程，每条线程并行执行不同的任务

进程与线程的区别？

Threads share the address space of the process that created it; processes have their own address space.
Threads have direct access to the data segment of its process; processes have their own copy of the data segment of the parent process.
Threads can directly communicate with other threads of its process; processes must use interprocess communication to communicate with sibling processes.
New threads are easily created; new processes require duplication of the parent process.
Threads can exercise considerable control over threads of the same process; processes can only exercise control over child processes.
Changes to the main thread (cancellation, priority change, etc.) may affect the behavior of the other threads of the process; changes to the parent process does not affect child processes.

Python GIL(Global Interpreter Lock)　

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

上面的核心意思就是，无论你启多少个线程，你有多少个cpu, Python在执行的时候会淡定的在同一时刻只允许一个线程运行，其他解释器的python可以不考虑这个GIL，只有cpython

首先需要明确的一点是GIL并不是Python的特性，它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言（语法）标准，但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC，INTEL C++，Visual C++等。Python也一样，同样一段代码可以通过CPython，PyPy，Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python，也就想当然的把GIL归结为Python语言的缺陷。所以这里要先明确一点：GIL并不是Python的特性，Python完全可以不依赖于GIL

Python threading模块

线程有2种调用方式，如下：

直接调用

import threading
import time
import os
import ctypes
 
def sayhi(num): #定义每个线程要运行的函数
 
    print("running on number:%s" %num)
 
    time.sleep(3)
 
if __name__ == '__main__':
    for i in range(10):
        t1 = threading.Thread(target=sayhi,args=(i,)) #生成一个线程实例
        t1.start() #启动线程    
        print(t1.getName()) #获取线程名
        print("thread parent id:",os.getpid()) # 无法获取线程号，只能获取进程号
        print("thread child id:",ctypes.CDLL('libc.so.6').syscall(186))

('thread parent id:', 11301)
('thread child id:', 11301)
Thread-7running on number:6

('thread parent id:', 11301)
('thread child id:', 11301)
Thread-8running on number:7
('th
read parent id:', 11301)
('thread child id:', 11301)
running on number:8Thread-9
('thread parent id:', 11301)
('thread child id:', 11301)
running on number:9
 Thread-10
('thread parent id:', 11301)
('thread child id:', 11301)

继承式调用

import threading
import time
 
 
class MyThread(threading.Thread):
    def __init__(self,num):
        threading.Thread.__init__(self)
        self.num = num
 
    def run(self):#定义每个线程要运行的函数
 
        print("running on number:%s" %self.num)
 
        time.sleep(3)
 
if __name__ == '__main__':
 
    t1 = MyThread(1)
    t2 = MyThread(2)
    t1.start()
    t2.start()

View Code

Join & Daemon

#!/usr/bin/env python
# set coding: utf-8
__author__ = "richardzgt"

import threading
import time

class MyThread(threading.Thread):
    def __init__(self,n,sleep_time):
        super(MyThread,self).__init__()
        self.n = n
        self.sleep_time = sleep_time

    def run(self):
        print("run task",self.n)
        time.sleep(self.sleep_time)
        print("task done",self.n)



t1 = MyThread("t1",1)
t2 = MyThread("t2",3)

t1.start()
t2.start()

t1.join()
t2.join()   # 等待所有子线程退出

print("main thread staring......")

View Code

#!/usr/bin/env python
# set coding: utf-8
__author__ = "richardzgt"
# daemon

import threading
import time





def run(sleep_time):
    print("start task ",sleep_time)
    time.sleep(sleep_time)
    print("task done" ,sleep_time)

t_job = []
for i in range(10):
    t = threading.Thread(target=run,args=(i,))
    t.setDaemon(True) # 把当前线程设置为守护线程，当当前线程退出后，子线程也就退出，可以用于主线程对子线程的守护
    t.start()
    t_job.append(t)


time.sleep(5)  ## 5s后主线程退出，子线程6-9自动退出
print("main thread done........")

View Code

Note：Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.

线程锁(互斥锁Mutex)

一个进程下可以启动多个线程，多个线程共享父进程的内存空间，也就意味着每个线程可以访问同一份数据，此时，如果2个线程同时要修改同一份数据，会出现什么状况？

import time
import threading
 
def addNum():
    global num #在每个线程中都获取这个全局变量
    print('--get num:',num )
    time.sleep(1)
    num  -=1 #对此公共变量进行-1操作
 
num = 100  #设定一个共享变量
thread_list = []
for i in range(100):
    t = threading.Thread(target=addNum)
    t.start()
    thread_list.append(t)
 
for t in thread_list: #等待所有线程执行完毕
    t.join()
 
 
print('final num:', num )

View Code

正常来讲，这个num结果应该是0，但在python 2.7上多运行几次，会发现，最后打印出来的num结果不总是0，为什么每次运行的结果不一样呢？哈，很简单，假设你有A,B两个线程，此时都要对num 进行减1操作，由于2个线程是并发同时运行的，所以2个线程很有可能同时拿走了num=100这个初始变量交给cpu去运算，当A线程去处完的结果是99，但此时B线程运算完的结果也是99，两个线程同时CPU运算的结果再赋值给num变量后，结果就都是99。那怎么办呢？很简单，每个线程在要修改公共数据时，为了避免自己在还没改完的时候别人也来修改此数据，可以给这个数据加一把锁，这样其它线程想修改此数据时就必须等待你修改完毕并把锁释放掉后才能再访问此数据。

*注：不要在3.x上运行，不知为什么，3.x上的结果总是正确的，可能是自动加了锁

加锁版本

import threading
import time


def run(n):
    global  num
    lock.acquire()      # 同一时间只有一个线程获取锁
    print("start task and lock acquire",n)
    num += 1
    time.sleep(1)
    print("task done and lock release" ,n)
    lock.release()


lock = threading.Lock()
num = 0
t_job = []
for i in range(3):
    t = threading.Thread(target=run,args=(i,))
    t.start()
    t_job.append(t)

for each in t_job:
    each.join()

print("main thread done........")

View Code

GIL VS Lock

Python已经有一个GIL来保证同一时间只能有一个线程来执行了，为什么这里还需要lock? 注意啦，这里的lock是用户级的lock,跟那个GIL没关系。

既然用户程序已经自己有锁了，那为什么C python还需要GIL呢？加入GIL主要的原因是为了降低程序的开发的复杂度，比如现在的你写python不需要关心内存回收的问题，因为Python解释器帮你自动定期进行内存回收，你可以理解为python解释器里有一个独立的线程，每过一段时间它起wake up做一次全局轮询看看哪些内存数据是可以被清空的，此时你自己的程序里的线程和 py解释器自己的线程是并发运行的，假设你的线程删除了一个变量，py解释器的垃圾回收线程在清空这个变量的过程中的clearing时刻，可能一个其它线程正好又重新给这个还没来及得清空的内存空间赋值了，结果就有可能新赋值的数据被删除了，为了解决类似的问题，python解释器简单粗暴的加了锁，即当一个线程运行时，其它人都不能动，这样就解决了上述的问题，这可以说是Python早期版本的遗留问题。

RLock（递归锁）

说白了就是在一个大锁中还要再包含子锁

import threading,time
 
def run1():
    print("grab the first part data")
    lock.acquire()
    global num
    num +=1
    lock.release()
    return num
def run2():
    print("grab the second part data")
    lock.acquire()
    global  num2
    num2+=1
    lock.release()
    return num2
def run3():
    lock.acquire()
    res = run1()
    print('--------between run1 and run2-----')
    res2 = run2()
    lock.release()
    print(res,res2)
 
 
if __name__ == '__main__':
 
    num,num2 = 0,0
    lock = threading.RLock()
    for i in range(10):
        t = threading.Thread(target=run3)
        t.start()
 
while threading.active_count() != 1:
    print(threading.active_count())
else:
    print('----all threads done---')
    print(num,num2)

View Code

Semaphore(信号量)

互斥锁同时只允许一个线程更改数据，而Semaphore是同时允许一定数量的线程更改数据，比如厕所有3个坑，那最多只允许3个人上厕所，后面的人只能等里面有人出来了才能再进去。

#!/usr/bin/env python
# set coding: utf-8
__author__ = "richardzgt"


import threading,time

def run(n):
    semaphore.acquire()
    time.sleep(1)
    print("run the thread: ",n)
    semaphore.release()

semaphore = threading.BoundedSemaphore(5)  # 允许最多5个线程同时工作

for i in range(10):
    t = threading.Thread(target=run,args=(i,))
    t.start()

while threading.active_count() != 1:
    pass

else:
    print("------ all threads done -------")

View Code

Timer 定时器

This class represents an action that should be run only after a certain amount of time has passed

Timers are started, as with threads, by calling their start() method. The timer can be stopped (before its action has begun) by calling thecancel() method. The interval the timer will wait before executing its action may not be exactly the same as the interval specified by the user.

def hello():
    print("hello, world")
 
t = Timer(30.0, hello)
t.start()  # after 30 seconds, "hello, world" will be printed

Events

An event is a simple synchronization object;

the event represents an internal flag, and threads
can wait for the flag to be set, or set or clear the flag themselves.

event = threading.Event()

# a client thread can wait for the flag to be set
event.wait()

# a server thread can set or reset it
event.set()
event.clear()
If the flag is set, the wait method doesn’t do anything.
If the flag is cleared, wait will block until it becomes set again.
Any number of threads may wait for the same event.

通过Event来实现两个或多个线程间的交互，下面是一个红绿灯的例子，即起动一个线程做交通指挥灯，生成几个线程做车辆，车辆行驶按红灯停，绿灯行的规则。

import threading,time
import random
def light():
    if not event.isSet():
        event.set() #wait就不阻塞 #绿灯状态
    count = 0
    while True:
        if count < 10:
            print('\033[42;1m--green light on---\033[0m')
        elif count <13:
            print('\033[43;1m--yellow light on---\033[0m')
        elif count <20:
            if event.isSet():
                event.clear()
            print('\033[41;1m--red light on---\033[0m')
        else:
            count = 0
            event.set() #打开绿灯
        time.sleep(1)
        count +=1
def car(n):
    while 1:
        time.sleep(random.randrange(10))
        if  event.isSet(): #绿灯
            print("car [%s] is running.." % n)
        else:
            print("car [%s] is waiting for the red light.." %n)
if __name__ == '__main__':
    event = threading.Event()
    Light = threading.Thread(target=light)
    Light.start()
    for i in range(3):
        t = threading.Thread(target=car,args=(i,))
        t.start()

View Code

queue队列

queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

class queue.Queue(maxsize=0) #先入先出

class queue.LifoQueue(maxsize=0) #last in fisrt out
class queue.PriorityQueue(maxsize=0) #存储数据时可设置优先级的队列

exception queue.Empty: Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty.

exception queue.Full: Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full.

Queue.qsize()

Queue.empty() #return True if empty

Queue.full() # return True if full

Queue.put_nowait(item): Equivalent to put(item, False).

Queue.get(block=True, timeout=None): Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Queue.get_nowait(): Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Queue.task_done()


Queue.put(item, block=True, timeout=None)

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

Queue.join() block直到queue被消费完毕

生产者消费者模型

在并发编程中使用生产者和消费者模式能够解决绝大多数并发问题。该模式通过平衡生产线程和消费线程的工作能力来提高程序的整体处理数据的速度。

为什么要使用生产者和消费者模式

在线程世界里，生产者就是生产数据的线程，消费者就是消费数据的线程。在多线程开发当中，如果生产者处理速度很快，而消费者处理速度很慢，那么生产者就必须等待消费者处理完，才能继续生产数据。同样的道理，如果消费者的处理能力大于生产者，那么消费者就必须等待生产者。为了解决这个问题于是引入了生产者和消费者模式。

什么是生产者消费者模式

生产者消费者模式是通过一个容器来解决生产者和消费者的强耦合问题。生产者和消费者彼此之间不直接通讯，而通过阻塞队列来进行通讯，所以生产者生产完数据之后不用等待消费者处理，直接扔给阻塞队列，消费者不找生产者要数据，而是直接从阻塞队列里取，阻塞队列就相当于一个缓冲区，平衡了生产者和消费者的处理能力。

#!/usr/bin/env python
# set coding: utf-8
__author__ = "richardzgt"


import  threading,time
import queue


q = queue.Queue(maxsize=10)


def Producer():
    count = 1
    while True:
        q.put("g%s" % count)
        print("produer queue g%s" %count)
        count += 1
        time.sleep(1)

def Consumer():
    while True:
        good = q.get()
        print("Consumer queue ",good)
        time.sleep(5)



p = threading.Thread(target=Producer)
c = threading.Thread(target=Consumer)

p.start()
c.start()

View Code

import time,random
import queue,threading
q = queue.Queue()
def Producer(name):
  count = 0
  while count <20:
    time.sleep(random.randrange(3))
    q.put(count)
    print('Producer %s has produced %s baozi..' %(name, count))
    count +=1
def Consumer(name):
  count = 0
  while count <20:
    time.sleep(random.randrange(4))
    if not q.empty():
        data = q.get()
        print(data)
        print('\033[32;1mConsumer %s has eat %s baozi...\033[0m' %(name, data))
    else:
        print("-----no baozi anymore----")
    count +=1
p1 = threading.Thread(target=Producer, args=('A',))
c1 = threading.Thread(target=Consumer, args=('B',))
p1.start()
c1.start()

View Code

使用Condition 条件变量象能让一个线程停下来，等待其它线程满足了某个 “条件”，当符合才开始继续下去

# -*- set coding:utf-8 -*-
#!/usr/bin/python
'''
Created on 2015年8月27日

@author: Administrator
'''

# encoding: UTF-8
import threading
import time
 
# 商品
product = None
# 条件变量
con = threading.Condition()
 
# 生产者方法
def produce():
    global product
    
    if con.acquire():
        while True:
            if product is None:
                print 'produce...'
                product = 'anything'
                
                # 通知消费者，商品已经生产
                con.notify()
            
            # 等待通知
            con.wait()
            time.sleep(2)
 
# 消费者方法
def consume():
    global product
    
    if con.acquire():
        while True:
            if product is not None:
                print 'consume...'
                product = None
                
                # 通知生产者，商品已经没了
                con.notify()
            
            # 等待通知
            con.wait()
            time.sleep(2)
 
t1 = threading.Thread(target=produce)
t2 = threading.Thread(target=consume)
t2.start()
t1.start()

View Code

多进程multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

多进程是真正意义上多线程的应用。cpu密集型用多进程处理。

import time
import multiprocessing
def f(name):
    time.sleep(2)
    print('hello', name)
 
if __name__ == '__main__':
    p = multiprocessing.Process(target=f, args=('bob',))
    p.start()
    p.join()

确定多进程是不是真的是多进程

from multiprocessing import Process
import os
 
def info(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())
    print("\n\n")
 
def f(name):
    info('\033[31;1mfunction f\033[0m')
    print('hello', name)
 
if __name__ == '__main__':
    p_list = []
    info('main process line')
    for i in range(10):
        p = Process(target=f, args=('bob%s'% i,))
        p.start()
        p_list.append(p)
        
    for j in p_list:
        j.join()
======================》
('module name:', '__main__')
('parent process:', 10340)
('process[31;1mfunction f[0m[31;1mfunction f[0m
('module name:', '__main__')
('parent process:', 10340)
('process id:', 10353)

('hello', 'bob9')
[31;1mfunction f[0m
('module name:', '__main__')
('parent process:', 10340)
('process id:', 10352)

('hello', 'bob8')

多进程 multiprocess 和 requests结合,多进程并发访问服务(常用)

import multiprocessing 
import requests,os,time,random


p = multiprocessing.Pool(100)
q = multiprocessing.Queue()

url="http://10.0.0.219:4444"
def get_data(url,i,q):
    r = requests.get(url)
    q.put("try count:[%s];result [%s]" % (i,r.status_code))

for i in range(100):
    p = multiprocessing.Process(target=get_data,args=(url,i,q))
    p.start()

while not q.empty():
    print q.get()


>>try count:[0];result [200]
try count:[2];result [503]
try count:[3];result [503]
try count:[6];result [503]
try count:[4];result [503]
try count:[5];result [503]
.........

进程池　　

进程池内部维护一个进程序列，当使用时，则去进程池中获取一个进程，如果进程池序列中没有可供使用的进进程，那么程序就会等待，直到进程池中有可用进程为止。

进程池中有两个方法：

apply
apply_async

from  multiprocessing import Process,Pool
import time
 
def Foo(i):
    time.sleep(2)
    return i+100
 
def Bar(arg):  #回调函数
    print('-->exec done:',arg)
 
pool = Pool(5)
 
for i in range(10):
    pool.apply_async(func=Foo, args=(i,),callback=Bar)
    #pool.apply(func=Foo, args=(i,))
 
print('end')
pool.close()
pool.join()#进程池中进程执行完毕后再关闭，如果注释，那么程序直接关闭。

 p.apply_async(func[, args[, kwds[, callback]]]) 函数的好处是有一个callback回调接口,非阻塞, print [ res.get() for res in result ] 方式会导致阻塞
注意args=('test',) 需要是tuple类型[类似t.thread], kwds=**dict

一般使用方法：

def worker(item):
    try:
        reg_func = re.split('[\[\]]',item)
        assert reg_func is not None, "regex error for %s" % item
        pname,func = reg_func[0].split('.') # 'webCheck.HttpCode'
        url = str(reg_func[1])
        ws = WebStatus(url)
        ws.get()
        value = ws.request_value()[func] if ws.request_value()[func] else 0
    except Exception as e:
        logger.error(str(e))      
    return {'item':item,'value':value}


def callback_zbx_sender(kwargs):
    logger.debug(kwargs)
    packet = [ ZabbixMetric(ZBX_HOST, kwargs['item'], kwargs['value']) ]
    result = ZabbixSender(use_config=True,zabbix_port=10051).send(packet)
    logger.debug(result)
    if result.failed:
        logger.error(result)


def run():
    pool = Pool(processes=10)
    ret_zbx_items = get_zbx_items()
    for item in ret_zbx_items:
        pool.apply_async(func=worker,args=(item,),callback=callback_zbx_sender)
    pool.close()
    pool.join()

如果要使用使用在class 里面,需要先定义一个代理函数，但是这个方法有些情况也不一定会成功，特别是用到queue的时候

例1

# 代理函数
def run(cls_instance, i):
    return cls_instance.func(i)


class Runner(object):
    def __init__(self):
        self.pool = Pool(processes=2)
        for i in range(2):
            self.pool.apply_async(run, (self, i),callback=self.callback)
        self.pool.close()
        self.pool.join()

    def func(self, i):
        print i
        time.sleep(i)
        return i+10

    def callback(self,v):
        print "get return===",v

    def __getstate__(self):
        self_dict = self.__dict__.copy()
        del self_dict['pool']
        return self_dict

    def __setstate__(self, state):
        self.__dict__.update(state)

runner = Runner()

例2

import multiprocessing

def multiSalt(cls_instance,token,i):
     logger.info("starting subprocess %s" % os.getpid())
     return cls_instance.salt_local(token,**i)

def multiRun_2(cls,t_list):
     result = []
     token = cls.get_salt_token()
     p = multiprocessing.Pool(5)
     for i in t_list:
         result.append(p.apply_async(multiSalt,(cls,token,i)))
     p.close()
     p.join()
 　　 # 后面那部分其实没意义,只要送到callback函数就可以了
     for ret_task in result:
         task_ret = ret_task.get()
         ret_task_id = task_ret.get('taskid')
     
     [ task.update(task_ret) for task in t_list if ret_task_id == task.get('taskid') ]

进程间通讯　

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：

Queues

使用方法跟threading里的queue差不多

from multiprocessing import Process, Queue
 
def f(q):
    q.put([42, None, 'hello'])
 
if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

Pipes

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:

from multiprocessing import Process, Pipe
 
def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()
 
if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print(parent_conn.recv())   # prints "[42, None, 'hello']"
    p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Managers

manager提供了所有的对象共享的方法：字典，序列，等等

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array. For example,

from multiprocessing import Process, Manager
 
def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.append(1)
    print(l)
 
if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
 
        l = manager.list(range(5))
        p_list = []
        for i in range(10):
            p = Process(target=f, args=(d, l))
            p.start()
            p_list.append(p)
        for res in p_list:
            res.join()
 
        print(d)
        print(l)

进程同步

Without using the lock output from the different processes is liable to get all mixed up.

from multiprocessing import Process, Lock
 
def f(l, i):
    l.acquire()
    try:
        print('hello world', i)
    finally:
        l.release()
 
if __name__ == '__main__':
    lock = Lock()
 
    for num in range(10):
        Process(target=f, args=(lock, num)).start()

协程

协程，又称微线程，纤程。英文名Coroutine。一句话说明什么是线程：协程是一种用户态的轻量级线程。

协程拥有自己的寄存器上下文和栈。协程调度切换时，将寄存器上下文和栈保存到其他地方，在切回来的时候，恢复先前保存的寄存器上下文和栈。因此：

协程能保留上一次调用时的状态（即所有局部状态的一个特定组合），每次过程重入时，就相当于进入上一次调用的状态，换种说法：进入上一次离开时所处逻辑流的位

协程的好处：

无需线程上下文切换的开销
无需原子操作锁定及同步的开销
- 　　"原子操作(atomic operation)是不需要synchronized"，所谓原子操作是指不会被线程调度机制打断的操作；这种操作一旦开始，就一直运行到结束，中间不会有任何 context switch （切换到另一个线程）。原子操作可以是一个步骤，也可以是多个操作步骤，但是其顺序是不可以被打乱，或者切割掉只执行部分。视作整体是原子性的核心。
方便切换控制流，简化编程模型
高并发+高扩展性+低成本：一个CPU支持上万的协程都不是问题。所以很适合用于高并发处理。

缺点：

无法利用多核资源：协程的本质是个单线程,它不能同时将单个CPU 的多个核用上,协程需要和进程配合才能运行在多CPU上.当然我们日常所编写的绝大部分应用都没有这个必要，除非是cpu密集型应用。
进行阻塞（Blocking）操作（如IO时）会阻塞掉整个程序

使用yield实现协程操作例子

import time
import queue
def consumer(name):
    print("--->starting eating baozi...")
    while True:
        new_baozi = yield
        print("[%s] is eating baozi %s" % (name,new_baozi))
        #time.sleep(1) 
 
def producer():
 
    r = con.__next__()
    r = con2.__next__()
    n = 0
    while n < 5:
        n +=1
        con.send(n)
        con2.send(n)
        print("\033[32;1m[producer]\033[0m is making baozi %s" %n )
 
 
if __name__ == '__main__':
    con = consumer("c1")
    con2 = consumer("c2")
    p = producer()

必须在只有一个单线程里实现并发
修改共享数据不需加锁
用户程序里自己保存多个控制流的上下文栈
一个协程遇到IO操作自动切换到其它协程

基于上面这4点定义，我们刚才用yield实现的程并不能算是合格的线程，因为它有一点功能没实现，就是共享数据加锁

Greenlet

greenlet是一个用C实现的协程模块，相比与python自带的yield，它可以使你在任意函数之间随意切换，而不需把这个函数先声明为generator

# -*- coding:utf-8 -*-
 
 
from greenlet import greenlet
 
 
def test1():
    print(12)
    gr2.switch()
    print(34)
    gr2.switch()
 
 
def test2():
    print(56)
    gr1.switch()
    print(78)
 
 
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()

感觉确实用着比generator还简单了呢，但好像还没有解决一个问题，就是遇到IO操作，自动切换，对不对？

Gevent

Gevent 是一个第三方库，可以轻松通过gevent实现并发同步或异步编程，在gevent中用到的主要模式是Greenlet, 它是以C扩展模块形式接入Python的轻量级协程。 Greenlet全部运行在主程序操作系统进程的内部，但它们被协作式地调度。

import gevent
 
def func1():
    print('\033[31;1m李闯在跟海涛搞...\033[0m')
    gevent.sleep(2)
    print('\033[31;1m李闯又回去跟继续跟海涛搞...\033[0m')
 
def func2():
    print('\033[32;1m李闯切换到了跟海龙搞...\033[0m')
    gevent.sleep(1)
    print('\033[32;1m李闯搞完了海涛，回来继续跟海龙搞...\033[0m')
 
 
gevent.joinall([
    gevent.spawn(func1),
    gevent.spawn(func2),
    #gevent.spawn(func3),
])

输出：

李闯在跟海涛搞...
李闯切换到了跟海龙搞...
李闯搞完了海涛，回来继续跟海龙搞...
李闯又回去跟继续跟海涛搞...

同步与异步的性能区别

import gevent
 
def task(pid):
    """
    Some non-deterministic task
    """
    gevent.sleep(0.5)
    print('Task %s done' % pid)
 
def synchronous():
    for i in range(1,10):
        task(i)
 
def asynchronous():
    threads = [gevent.spawn(task, i) for i in range(10)]
    gevent.joinall(threads)
 
print('Synchronous:')
synchronous()
 
print('Asynchronous:')
asynchronous()

上面程序的重要部分是将task函数封装到Greenlet内部线程的gevent.spawn。初始化的greenlet列表存放在数组threads中，此数组被传给gevent.joinall 函数，后者阻塞当前流程，并执行所有给定的greenlet。执行流程只会在所有greenlet执行完后才会继续向下走。　　

遇到IO阻塞时会自动切换任务

注意 monkey patch是必须要导入的，否则不能获取urlopen的IO等待事件

from gevent import monkey; monkey.patch_all()
import gevent
from  urllib.request import urlopen
 
def f(url):
    print('GET: %s' % url)
    resp = urlopen(url)
    data = resp.read()
    print('%d bytes received from %s.' % (len(data), url))
 
gevent.joinall([
        gevent.spawn(f, 'https://www.python.org/'),
        gevent.spawn(f, 'https://www.yahoo.com/'),
        gevent.spawn(f, 'https://github.com/'),
])

通过gevent实现单线程下的多socket并发

server side

import sys
import socket
import time
import gevent
 
from gevent import socket,monkey
monkey.patch_all()
 
 
def server(port):
    s = socket.socket()
    s.bind(('0.0.0.0', port))
    s.listen(500)
    while True:
        cli, addr = s.accept()
        gevent.spawn(handle_request, cli)
 
 
 
def handle_request(conn):
    try:
        while True:
            data = conn.recv(1024)
            print("recv:", data)
            conn.send(data)
            if not data:
                conn.shutdown(socket.SHUT_WR)
 
    except Exception as  ex:
        print(ex)
    finally:
        conn.close()
if __name__ == '__main__':
    server(8001)

View Code

dome2

import gevent
import gevent.socket as socket
from gevent.pool import Pool


class SocketPool(object):
    """docstring for SocketPool"""
    def __init__(self):
        self.pool = Pool(100)
        self.pool.start(self.listen)        

    def listen(self):
        self.server = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        self.server.bind('127.0.0.1',5431)
        self.server.listen(1024)
        while True:
            socket,addr = server.accept()
            data = socket.recv()
            print data

    def add_handler(self, socket):
        if self.pool.full():
            raise Exception("At maximum pool size")
        else:
            self.pool.spawn(socket)

    def shutdown(self):
        self.pool.kill()


SocketPool()

View Code

client side 　

import socket
import threading

def sock_conn():

    client = socket.socket()

    client.connect(("localhost",8001))
    count = 0
    while True:
        #msg = input(">>:").strip()
        #if len(msg) == 0:continue
        client.send( ("hello %s" %count).encode("utf-8"))

        data = client.recv(1024)

        print("[%s]recv from server:" % threading.get_ident(),data.decode()) #结果
        count +=1
    client.close()


for i in range(100):
    t = threading.Thread(target=sock_conn)
    t.start()

View Code

并发100个sock连接

携程中的多进程

import gevent
from gevent.subprocess import Popen, PIPE

def cron():
    while True:
        print("cron")
        gevent.sleep(0.2)

g = gevent.spawn(cron)
'到最后执行'
sub = Popen(['sleep 1; uname'], stdout=PIPE, shell=True)
out, err = sub.communicate()
g.kill()
print(out.rstrip())

携程中的队列

import gevent
from gevent.queue import Queue
import random

tasks = Queue()

def worker(n):
    while not tasks.empty():
        task = tasks.get()
        print('Worker %s got task %s' % (n, task))
        gevent.sleep(random.randint(1,3))

    print('Quitting time!')

def boss():
    for i in xrange(1,25):
        tasks.put_nowait(i)

gevent.spawn(boss).join()

gevent.joinall([
    gevent.spawn(worker, 'steve'),
    gevent.spawn(worker, 'john'),
    gevent.spawn(worker, 'nancy'),
])

携程中的管道

import gevent
from multiprocessing import Process, Pipe
from gevent.socket import wait_read, wait_write

# To Process
a, b = Pipe()

# From Process
c, d = Pipe()

def relay():
    for i in xrange(10):
        msg = b.recv()
        gevent.sleep(1)
        c.send(msg + " in " + str(i))

def put_msg():
    for i in xrange(10):
        wait_write(a.fileno())
        a.send('hi')

def get_msg():
    for i in xrange(10):
        wait_read(d.fileno())
        print(d.recv())

if __name__ == '__main__':
    proc = Process(target=relay)
    proc.start()

    g1 = gevent.spawn(get_msg)
    g2 = gevent.spawn(put_msg)
    gevent.joinall([g1, g2], timeout=10)

携程的异步返回

import gevent
from gevent.event import AsyncResult
a = AsyncResult()

def setter():
    """
    After 3 seconds set the result of a.
    """
    gevent.sleep(3)
    a.set('Hello!')

def waiter():
    """
    After 3 seconds the get call will unblock after the setter
    puts a value into the AsyncResult.
    """
    print(a.get())

gevent.joinall([
    gevent.spawn(setter),
    gevent.spawn(waiter),
])

使用携程做一个WSGI服务器（自带模块）

from gevent.wsgi import WSGIServer

def application(environ, start_response, log=logger):
    status = '200 OK'
    body = '<p>Hello World</p>'

    headers = [
        ('Content-Type', 'text/html')
    ]

    start_response(status, headers)
    return [body]

WSGIServer(('', 8000), application).serve_forever()

在多进程中使用携程

#encoding=utf-8
'''
演示如何多进程的使用gevent,
1、gevent和multiprocessing组合使用会有很多问题，
  所以多进程直接用subprocess.Popen,进程间不通过fork共享
  任何数据,完全独立运行,并通过socket通信
2、进程间同步不能用multiprocessing.Event,
  因为wait()的时候会阻塞住线程，其它协程的代码无法执行，也
  不能使用gevent.event.Event()，因为它通过multiprocessing.Process
  共享到子进程后，在父进程set()，子进程wait()是不会收到信号的
3、子进程内不能通过signal.signal(signal.SIGINT, signal.SIG_IGN)
  忽略ctrl+c，所以启动主进程时如果没设置后台运行，在ctrl+c时，主进程
  和子进程都会中止而不能优雅退出
4、主进程和子进程的通信和同步使用gevent.socket来实现，子进程收到
  主进程断开连接事件(接受到零字节数据)时,自己优雅退出,相当于主进程
  发消息告诉子进程让子进程退出
5、主进程启动时直接在后台运行，使用"nohup gevent-multil-process.py &"来运行，
  测试时可不用nohup命令，停止主进程时使用kill pid的方式，在主进程里
  会拦截SIGTERM信号，通知并等待子进程退出
'''
import gevent
import gevent.socket as socket
from gevent.event import Event
import os
import sys
import subprocess
import signal
import time
url = ('localhost', 8888)

class Worker(object):
    '''
    子进程运行的代码,通过起一个协程来和主进程通信
    包括接受任务分配请求，退出信号(零字节包)，及反馈任务执行进度
    然后主协程等待停止信号并中止进程(stop_event用于协程间同步)。
    '''
    def __init__(self, url):
        self.url = url
        self.stop_event = Event()
        gevent.spawn(self.communicate)
        try:
            self.stop_event.wait()
        except KeyboardInterrupt,e:
            pass

        print 'worker(%s):will stop' % os.getpid()
    def exec_task(self, task):
        print 'worker(%s):execute task:%s' % (os.getpid(), task.rstrip('\n'))
        # time.sleep(10) '测试阻塞等待'
    def communicate(self):
        print 'worker(%s):started' % os.getpid()
        client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        client.connect(self.url)
        fp = client.makefile()
        while True:
            line = fp.readline()
            if not line:
                self.stop_event.set()
                break
            '单独起一个协程去执行任务，防止通信协程阻塞'
            gevent.spawn(self.exec_task, line)

class Master():
    '''
    主进程运行代码,启动单独协程监听一个端口以供子进程连接和通信用，
    通过subprocess.Popen启动CPU个数个子进程,注册SIGTERM信号以便在
    KILL自己时通知子进程退出，主协程等待停止事件并退出主
    '''
    def __init__(self, url):
        self.url = url
        self.workers = []
        self.stop_event = Event()

        gevent.spawn(self.communicate)
        gevent.sleep(0) #让communicate协程有机会执行，否则子进程会先启动

        self.process = [subprocess.Popen(('python',sys.argv[0],'worker'))
            for i in xrange(3)] #启动multiprocessing.cpucount-1个子进程

        gevent.signal(signal.SIGTERM, self.stop) #拦截kill信号
        gevent.signal(signal.SIGINT, self.ctrlC) #拦截ctrl+c信号

        gevent.spawn(self.test) #测试分发任务

        self.stop_event.wait() 

    def communicate(self):
        print 'master(%s):started' % os.getpid()
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        server.bind(url)
        server.listen(1024)
        while True:
            worker, addr = server.accept()
            print 'master(%s):new worker' % os.getpid()
            self.workers.append(worker)

    def stop(self):
        print 'master stop'
        for worker in self.workers:
            worker.close()
        for p in self.process:
            p.wait()
        self.stop_event.set()

    def ctrlC(self):
        print 'master stop from ctrl +c'
        for worker in self.workers:
            worker.close()
        for p in self.process:
            p.wait()
        gevent.sleep(0)
        self.stop_event.set()

    def test(self):
        import random
        while True:
            if not self.workers:
                gevent.sleep(1)
                continue
            task = str(random.randint(100,10000))
            worker = random.choice(self.workers)
            worker.send(task)
            worker.send('\n')
            gevent.sleep(1)

if len(sys.argv) == 1:
    Master(url)
else:
    Worker(url)

View Code

作业需求：

题目：做一个简单的聊天室

题目:简单主机批量管理工具

需求:

主机分组
主机信息配置文件用configparser解析
可批量执行命令、发送文件，结果实时返回，执行格式如下
1. batch_run -h h1,h2,h3 -g web_clusters,db_servers -cmd "df -h"　
2. batch_scp -h h1,h2,h3 -g web_clusters,db_servers -action put -local test.py -remote /tmp/　
主机用户名密码、端口可以不同
执行远程命令使用paramiko模块
批量命令需使用multiprocessing并发

posted @ 2017-10-31 14:00 richardzgt 阅读(10389) 评论(0) 收藏举报

刷新页面返回顶部

记录点滴

已迁移并关闭

Python之路[9] - 进程、线程、携程篇 - 迁

进程与线程

什么是进程(process)？

有了进程为什么还要线程？

什么是线程(thread)？

进程与线程的区别？

Python GIL(Global Interpreter Lock)

Python threading模块

Timer 定时器

queue队列

生产者消费者模型

多进程multiprocessing

进程池

进程间通讯

协程

Greenlet

Gevent

携程中的多进程

携程中的队列

携程中的管道

携程的异步返回

使用携程做一个WSGI服务器（自带模块）

在多进程中使用携程

作业需求：

公告