本节内容

1.进程、线程定义和区别

2.Python GIL全局解释器锁

3.多线程

1.语法

2.join

3.线程锁之Lock\Rlock\信号量

4.将线程变为守护进程

5.Event事件　

6.queue队列

7.生产者消费者模型

4.多进程

1.语法

2.进程间通讯

3.进程锁

4.进程池

5.协程

1.自己实现协程

2. 封装好的协程模块（greenlet、gevent）

3. 协程应用示例（简单爬虫、SocketServer）

一、进程线程

进程：程序并不能单独运行，只有将程序装载到内存中，系统为它分配资源才能运行，而这种执行的程序就称之为进程。进程里包含对各种系统资源的调用，以及内存对各种系统资源的整合。（进程要操作cpu，必须要先创建至少一个线程）

线程：操作系统最小的调度单位，是一串指令的集合

区别：

线程间共享内存空间，进程的内存空间是独立的
同一个进程的线程之间可以直接交流，两个进程想要通信必须通过一个中间代理来实现。
新线程很容易创建，创建新进程需要对其父进程进行克隆（所以启动速度方面线程快，但运行速度无可比性）
一个线程可以控制和操作同一进程里的其他线程，但进程只能操作子进程。
对一个父线程的修改可能会影响同一个进程下其他子线程，但对父进程修改对其子进程没有影响。

二、Python GIL全局解释器锁

GIL (Global Interpreter Lock)

对于CPython来说，无论计算机有多少核，同一时间只有一个线程在执行。之所以表面上看起来是多并发，其实是CPU不断进行上下文切换。

CPython线程调用了OS的原生线程接口（C语言写的），如果不加GIL，多线程修改同一份数据Python并不能控制，就会出现错误。

PyPy去掉了GIL，使用了JIT（Just-In-Time）技术，运行速度大大加快。

三、多线程(threading)

什么时候使用Python多线程（单线程上下文切换）

io操作不占用cpu，计算占用cpu。
Python多线程适合io操作密集型任务，不适合cpu操作密集型任务。

1.语法

import threading
def run(n):
print("task ",n )
for i in range(5):
# 实例化线程，args的值必须以tuple形式传入，如（a,），逗号不能漏写
t = threading.Thread(target=run,args=("t-%s" %i ,))
t.start() #启动线程

2.join

import threading,time
def run(n):
print("task ",n )
time.sleep(1)
for i in range(5):
# 实例化线程，args的值必须以tuple形式传入，如（a,）
t = threading.Thread(target=run,args=("t-%s" %i ,))
t.start() #启动线程
t.join() #等待线程完成再继续，程序变成串行

3.将线程变为守护进程

主线程等待所有非守护线程完成，主线程默认在最后有一个隐性的join()。
主线程不等待守护线程完成（守护线程即为主线程提供服务的线程，主线程结束守护线程随之结束）。

import threading,time
def run(n):
time.sleep(1)
print("task ",n )
for i in range(5):
# 实例化线程，args的值必须以tuple形式传入，如（a,）
t = threading.Thread(target=run,args=("t-%s" %i ,))
t.setDaemon(True) #将线程设置为守护线程
t.start() #启动线程

4.线程锁之Lock\Rlock\信号量

用户锁

GIL并没有对所有的线程共享资源加锁，视情况需要加用户锁（Python3已经做了优化，但是还是要加，官方并没有声明说不需要）

参考 http://www.cnblogs.com/alex3714/articles/5230609.html GIL VS Lock

RLOCK(递归锁)

锁中还有锁，每道锁和钥匙之间以字典形式做对应，防止混淆

import threading
def run1():
print("grab the first part data")
lock.acquire()
global num
num += 1
lock.release()
return num
def run2():
print("grab the second part data")
lock.acquire()
global num2
num2 += 1
lock.release()
return num2
def run3():
lock.acquire()
res = run1()
print('--------between run1 and run2-----')
res2 = run2()
lock.release()
print(res, res2)
num, num2 = 0, 0
lock = threading.RLock() #使用递归锁
for i in range(1):
t = threading.Thread(target=run3)
t.start()
while threading.active_count() != 1:
print(threading.active_count())
else:
print('----all threads done---')
print(num, num2)

线程锁（互斥锁Mutex）

一个进程下可以启动多个线程，多个线程共享父进程的内存空间，也就意味着每个线程可以访问同一份数据，如果2个线程同时要修改同一份数据，结果可能会发生错误。

import threading
import time
def run(n):
lock.acquire() #修改数据前加锁
global num #在每个线程中都获取这个全局变量
num +=1
time.sleep(1)
lock.release() #解锁
lock = threading.Lock() #python2.x上修改count没有GIL，只是做了count的复制，所以要加用户锁
num = 0
t_objs = [] #存线程实例
for i in range(50):
t = threading.Thread(target=run,args=("t-%s" %i ,))
t.start()
t_objs.append(t) #为了不阻塞后面线程的启动，不在这里join，先放到一个列表里
for t in t_objs: #循环线程实例列表，等待所有线程执行完毕
t.join()
print("----------all threads has finished...",threading.current_thread(),threading.active_count())
print("num:",num)

信号量（Semaphore）

互斥锁同时只允许一个线程更改数据，而Semaphore是同时允许一定数量的线程更改数据。可以用来限制 socketserver同一时间内允许连接数。

import threading, time
def run(n):
semaphore.acquire()
time.sleep(1)
print("run the thread: %s\n" % n)
semaphore.release()
if __name__ == '__main__':
semaphore = threading.BoundedSemaphore(5) # 最多允许5个线程同时运行
for i in range(22):
t = threading.Thread(target=run, args=(i,))
t.start()
while threading.active_count() != 1:
pass # print threading.active_count()
else:
print('----all threads done---')
#print(num)

5.Event事件　

通过Event来实现两个或多个线程间的交互

import time
import threading
event = threading.Event() #实例化
def lighter():
count = 0
event.set() #设置标志位，绿灯
while True:
if count >5 and count < 10: #改成红灯
event.clear() #清空标志位
print("\033[41;1mred light is on....\033[0m\r")
elif count >10:
event.set() #变绿灯
count = 0
else:
print("\033[42;1mgreen light is on....\033[0m\r")
time.sleep(1)
count +=1
def car(name):
while True:
if event.is_set(): #判断标志位是否设定，代表绿灯
print("[%s] running..."% name )
time.sleep(1)
else:
print("[%s] sees red light , waiting...." %name)
event.wait() #标志位被set什么都不做，标志位被clear等待set
print("\033[34;1m[%s] green light is on, start going...\033[0m" %name)
light = threading.Thread(target=lighter,)
light.start()
car1 = threading.Thread(target=car,args=("Tesla",))
car1.start()

6.queue队列

类

class queue.Queue(maxsize=0) #先入先出，maxsize为队列大小
class queue.LifoQueue(maxsize=0) #last in fisrt out
class queue.PriorityQueue(maxsize=0) #存储数据时可设置优先级的队列

作用

程序的解耦
提高效率

与列表区别

队列数据取出来就没了

示例1

import queue
q = queue.LifoQueue() #后入先出
q.put(1) #向队列中加数据
q.put(2)
q.put(3)
print('size',q.qsize()) #队列大小
print(q.get()) #get(self, block=True, timeout=None)，默认取不到block即阻塞
print(q.get())
print(q.get())
print(q.get_nowait()) #若取不到抛出异常queue.Empty

示例2

import queue
q = queue.PriorityQueue() #优先级高的先取出
q.put((-1,"chenronghua"))
q.put((3,"hanyang"))
q.put((10,"alex"))
q.put((6,"wangsen"))
print(q.get())
print(q.get())
print(q.get())
print(q.get())

Queue.task_done()

7.生产者消费者模型

作用

解耦，调整生产者不影响消费者

import threading,time
import queue
q = queue.Queue(maxsize=10)
def Producer(name):
count = 1
while True:
q.put("骨头%s" % count)
print("生产了骨头",count)
count +=1
time.sleep(0.1)
def Consumer(name):
#while q.qsize()>0:
while True:
print("[%s] 取到[%s] 并且吃了它..." %(name, q.get()))
time.sleep(1)
p = threading.Thread(target=Producer,args=("Alex",))
c = threading.Thread(target=Consumer,args=("ChengRonghua",))
c1 = threading.Thread(target=Consumer,args=("王森",))
p.start()
c.start()
c1.start()

四、多进程（multiprocessing）

CPython折衷利用CPU多核方法

启动多个进程，进程没有GIL的概念，计算机允许同时有与CPU核数量相同的进程运行。每个进程至少一个线程。

缺点：进程间内存是独立的，数据默认情况下不能共享

1.语法

from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid()) #os.getppid()父进程pid
print('process id:', os.getpid()) #os.getpid()进程pid
print("\n\n")
def f(name):
info('\033[31;1mcalled from child process function f\033[0m')
print('hello', name)
if __name__ == '__main__':
info('\033[32;1mmain process line\033[0m')
p = Process(target=f, args=('bob',)) #建立子进程
p.start() #启动子进程，每一个进程都是由父进程启动的
# p.join()
'''''''
main process line
module name: __main__
parent process: 9752
process id: 10908
called from child process function f
module name: __mp_main__
parent process: 10908
process id: 11224
hello bob
'''

2.进程间通讯

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：

进程Queue

使用方法跟threading里的queue差不多。主进程启动一个子进程，并且主进程建立一个Queue传给子进程，相当于把 Queue克隆一份传给子进程，对其中一个Queue进行操作时会自动通过pickle序列化与反序列化同步改变。

from multiprocessing import Process, Queue
import threading
#import queue #线程queue，线程间可以直接共享
# def f(q):
# q.put([42, None, 'hello'])
def f(qq):
print("in child:",qq.qsize())
qq.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue() #只有进程queue才能传给子进程
q.put("test123")
#p = threading.Thread(target=f,)
p = Process(target=f, args=(q,)) #需要传给子进程（实际上是克隆并用pickle同步的过程）
p.start()
p.join()
print("444",q.get_nowait())
print("444",q.get_nowait())
#prints "[42, None, 'hello']"
#print(q.get()) #"[42, None, 'hello']"

管道Pipe

像socket一样需要通信的两个进程间建立一对连接，互相发送消息。

注：相邻两次发送或接收数据是独立的，不会合并。

from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello from child'])
conn.send([42, None, 'hello from child2'])
print("from parent:",conn.recv())
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe() #建立成对的连接，像一对传声筒一样
p = Process(target=f, args=(child_conn,)) #将其中一个传给要通信的进程
p.start()
print(parent_conn.recv()) # prints "[42, None, 'hello']"
print(parent_conn.recv()) # prints "[42, None, 'hello']"
parent_conn.send("张洋可好") # prints "[42, None, 'hello']"
p.join()

Manager

相当于优化版的Queue，直接生成需要在进程间共享的数据类型即可，其他隐藏的拷贝给其他进程、合并数据等操作自动完成（已经默认加锁，同一时间只有一个进程修改数据）。

from multiprocessing import Process, Manager
import os
def f(d, l):
d[os.getpid()] =os.getpid()
l.append(os.getpid())
print(l)
if __name__ == '__main__':
with Manager() as manager:
d = manager.dict() #{} #生成一个字典，可在多个进程间共享和传递
l = manager.list(range(5))#生成一个列表，可在多个进程间共享和传递
p_list = []
for i in range(10):
p = Process(target=f, args=(d, l))
p.start()
p_list.append(p)
for res in p_list: #等待结果
res.join()
print(d)
print(l)

3.进程锁

父进程生成锁，并传递给子进程

存在意义：对于多个进程共享的资源需要加锁，如多个进程共享一块屏幕，在屏幕上输出时需要加锁，防止输出混乱。

from multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print('hello world', i)
l.release()
if __name__ == '__main__':
lock = Lock()
for num in range(100):
Process(target=f, args=(lock, num)).start()

4.进程池

线程由于系统资源开销非常小，所有一般不需要限制最大线程数量。线程过多对计算机唯一可能造成不好的影响是CPU上下文切换过于频繁，导致系统运行速度变慢。

与线程相对比，进程的系统资源开销大，如果进程过多有可能会搞瘫系统，所以要引入线程池，限制同一时间运行的进程数量。

from multiprocessing import Process, Pool,freeze_support
import time
import os
def Foo(i):
time.sleep(2)
print("in process",os.getpid())
return i + 100
def Bar(arg):
print('-->exec done:', arg,os.getpid())
if __name__ == '__main__': #windows上必须加这一句
#freeze_support()
pool = Pool(processes=3) #允许进程池同时放入5个进程
print("主进程",os.getpid())
for i in range(10):
pool.apply_async(func=Foo, args=(i,), callback=Bar) #callback=回调，子进程执行完了Foo
#主进程执行Bar。回调减少了子进程多余的操作，同类操作主进程执行就可以（如连接数据库）
#pool.apply(func=Foo, args=(i,)) #串行（同步）
#pool.apply_async(func=Foo, args=(i,)) #并行（异步）
print('end')
pool.close() #必须先close再join
pool.join() #进程池中进程执行完毕后再关闭，如果注释，那么程序直接关闭。.join()

五、协程

协程，又称微线程、纤程。英文名Coroutine。协程是一种用户态的轻量级线程。

协程拥有自己的寄存器上下文和栈。协程调度切换时，将寄存器上下文和栈保存到其他地方，在切回来的时候，恢复先前保存的寄存器上下文和栈。

协程可以实现单线程下的多并发。其原理是单线程同时处理多个连接，当遇到IO操作时就切换到其他连接，并且告诉OS如果IO操作完成就调用回调函数通知它切换回来（未接收到通知就不断轮循切换连接）

1.自己实现协程

import time
import queue
def consumer(name): #生成器（调用时才生成，只记录当前值，只有一个next方法）
print("--->starting eating baozi...")
while True:
new_baozi = yield
print("[%s] is eating baozi %s" % (name, new_baozi))
# time.sleep(1)
def producer():
r = con.__next__() #调用生成器
r = con2.__next__()
n = 0
while n < 5:
n += 1
con.send(n) #给生成器传值
con2.send(n)
time.sleep(1)
print("\033[32;1m[producer]\033[0m is making baozi %s" % n)
if __name__ == '__main__':
con = consumer("c1")
con2 = consumer("c2")
p = producer()

2.封装好的协程模块（greenlet、gevent）

greenlet(手动在协程间切换)

from greenlet import greenlet
def test1():
print(12)
gr2.switch()
print(34)
gr2.switch()
def test2():
print(56)
gr1.switch()
print(78)
gr1 = greenlet(test1) #启动一个协程
gr2 = greenlet(test2)
gr1.switch() #手动切换到协程gr1

gevent(是对greenlet的封装，自动切换)

import gevent
def foo():
print('Running in foo')
gevent.sleep(2) #gevent内部方法，模拟遇到IO操作（自动切换到其他协程）
print('Explicit context switch to foo again')
def bar():
print('Explicit精确的 context内容 to bar')
gevent.sleep(1)
print('Implicit context switch back to bar')
def func3():
print("running func3 ")
gevent.sleep(0)
print("running func3 again ")
#生成协程
gevent.joinall([
gevent.spawn(foo),
gevent.spawn(bar),
gevent.spawn(func3),
])

3.协程应用示例（简单爬虫、SocketServer）

简单爬虫示例

默认gevent检测不到urllib、socket等做了io操作，需要打补丁gevent.monkey.patch_all()

from urllib import request #urllib简单的爬虫模块
import gevent,time
from gevent import monkey #火眼金睛
monkey.patch_all() #把当前程序的所有的io操作给我单独的做上标记，默认gevent检测不到urllib(socket等)做了io操作
def f(url):
print('GET: %s' % url)
resp = request.urlopen(url)
data = resp.read()
print('%d bytes received from %s.' % (len(data), url))
urls = ['https://www.python.org/',
'https://www.yahoo.com/',
'https://github.com/' ]
time_start = time.time()
for url in urls:
f(url)
print("同步cost",time.time() - time_start)
async_time_start = time.time()
gevent.joinall([
gevent.spawn(f, 'https://www.python.org/'),
gevent.spawn(f, 'https://www.yahoo.com/'),
gevent.spawn(f, 'https://github.com/'),
])
print("异步cost",time.time() - async_time_start)

socketserver示例

import socket
import gevent
from gevent import socket, monkey
monkey.patch_all() #把当前程序的所有的io操作给我单独的做上标记
def server(port):
s = socket.socket()
s.bind(('0.0.0.0', port))
s.listen(500)
while True:
cli, addr = s.accept()
gevent.spawn(handle_request, cli)
def handle_request(conn):
try:
while True:
data = conn.recv(1024)
print("recv:", data)
conn.send(data)
if not data:
conn.shutdown(socket.SHUT_WR)
except Exception as ex:
print(ex)
finally:
conn.close()
if __name__ == '__main__':
server(8001)

参考：

http://www.cnblogs.com/alex3714/articles/5230609.html

http://www.cnblogs.com/alex3714/articles/5248247.html

posted on 2017-12-20 23:01 大小孩阅读(152) 评论(0) 收藏举报

刷新页面返回顶部

大小孩

公告

本节内容

一、进程线程

二、Python GIL全局解释器锁

三、多线程(threading)

1.语法

2.join

3.将线程变为守护进程

4.线程锁之Lock\Rlock\信号量

5.Event事件

6.queue队列

7.生产者消费者模型

四、多进程（multiprocessing）

1.语法

2.进程间通讯

3.进程锁

4.进程池

五、协程

1.自己实现协程

2.封装好的协程模块（greenlet、gevent）

3.协程应用示例（简单爬虫、SocketServer）

5.Event事件