玩转Python并发编程(2)——线程

一、前言

上一篇我们介绍了进程，进程适用于CPU密集型的任务。这一片我们来讲适用于IO密集型任务的线程。
首先需要弄清楚的是，进程与线程的含义和区别。其实在上一篇我们已经说过了，只不过这个概念实在太过重要，这里我们再强化一下。
1、进程是资源分配的最小单位，线程是程序执行的最小单位（资源调度的最小单位）。一个程序至少有一个进程,一个进程至少有一个线程.
2、进程有自己的独立地址空间，每启动一个进程，系统就会为它分配地址空间，建立数据表来维护代码段、堆栈段和数据段，这种操作非常昂贵。而线程是共享进程中的数据的，使用相同的地址空间，因此CPU切换一个线程的花费远比进程要小很多，同时创建一个线程的开销也比进程要小很多。
3、线程之间的通信更方便，同一进程下的线程共享全局变量、静态变量等数据，而进程之间的通信需要以队列(Queue)的方式进行。不过如何处理好同步与互斥是编写多线程程序的难点。
4、但是多进程程序更健壮，因为多线程程序只要有一个线程死掉，整个进程也死掉了，而一个进程死掉并不会对另外一个进程造成影响，因为进程有自己独立的地址空间。

弄清楚了进程和线程的区别，下面就有必要来说一下臭名昭著的GIL了。
GIL全称global interpreter lock，全局解释器锁，是 Python 解释器（CPython）中的一个布尔值，受到互斥保护。这个锁被 CPython 中的核心字节码用来评估循环，并调节用来执行语句的当前线程。
每个线程在执行的时候都需要先获取GIL，保证同一时刻只有一个线程可以执行代码，即同一时刻只有一个线程使用CPU。执行单线程程序的开发人员看不到GIL的影响，但它可能是CPU绑定多线程代码中的性能瓶颈。
Python的多线程即使是在多核CPU上，也只对于IO密集型线程产生正面效果；而当有至少一个CPU密集型线程存在，那么多线程效率会由于GIL的存在而大幅下降。

二、线程（Thread）

1、我们先来创建一个任务函数

import os
import random
import threading
import time


def expensive_function(n: int):
    # 获取当前进程的进程ID和父进程ID以及线程ID
    print(f'[PID = {os.getpid()}] (Parent process is {os.getppid()}) [TID = {threading.get_ident()}] Executing with n = {n}...')
    sleep_time = random.randint(1, 5)
    time.sleep(sleep_time)
    print(f'[PID = {os.getpid()}] [TID = {threading.get_ident()}] n = {n} Done, 休眠了{sleep_time}秒')
    
if __name__ == '__main__':
    expensive_function(3)

输出：

[PID = 18272] (Parent process is 5060) [TID = 16436] Executing with n = 3...
[PID = 18272] [TID = 16436] n = 3 Done, 休眠了4秒

可以看出，当前进程号为18272，父进程号为5060，当前线程号为16436。

2、线性依次执行一下任务函数。

import os
import random
import threading
import time


def expensive_function(n: int):
    # 获取当前进程的进程ID和父进程ID以及线程ID
    print(f'[PID = {os.getpid()}] (Parent process is {os.getppid()}) [TID = {threading.get_ident()}] Executing with n = {n}...')
    sleep_time = random.randint(1, 5)
    time.sleep(sleep_time)
    print(f'[PID = {os.getpid()}] [TID = {threading.get_ident()}] n = {n} Done, 休眠了{sleep_time}秒')


def serial_execute():
    for i in range(5):
        expensive_function(i)


if __name__ == '__main__':
    start = time.time()
    serial_execute()
    end = time.time()
    print(f'总时长{end - start}')

输出：

[PID = 8992] (Parent process is 5060) [TID = 14120] Executing with n = 0...
[PID = 8992] [TID = 14120] n = 0 Done, 休眠了3秒
[PID = 8992] (Parent process is 5060) [TID = 14120] Executing with n = 1...
[PID = 8992] [TID = 14120] n = 1 Done, 休眠了4秒
[PID = 8992] (Parent process is 5060) [TID = 14120] Executing with n = 2...
[PID = 8992] [TID = 14120] n = 2 Done, 休眠了3秒
[PID = 8992] (Parent process is 5060) [TID = 14120] Executing with n = 3...
[PID = 8992] [TID = 14120] n = 3 Done, 休眠了5秒
[PID = 8992] (Parent process is 5060) [TID = 14120] Executing with n = 4...
[PID = 8992] [TID = 14120] n = 4 Done, 休眠了4秒
总时长19.039063692092896

可以看出，由于是线性依次顺序执行的线程，所以该程序是单线程的，线程号是不变的，该线程完成一次循环后，才立刻进行下一次循环，执行任务函数，总时长几乎为5次休眠时间之和。

3、创建一个线程池。

import os
import random
import threading
import time
from concurrent import futures


def expensive_function(n: int):
    # 获取当前进程的进程ID和父进程ID以及线程ID
    print(f'[PID = {os.getpid()}] (Parent process is {os.getppid()}) [TID = {threading.get_ident()}] Executing with n = {n}...')
    sleep_time = random.randint(1, 5)
    time.sleep(sleep_time)
    print(f'[PID = {os.getpid()}] [TID = {threading.get_ident()}] n = {n} Done, 休眠了{sleep_time}秒')


def execute_with_pool():
    tasks = [i for i in range(10)]
    print('启动线程池')
    pool = futures.ThreadPoolExecutor(max_workers=5)
    pool.map(expensive_function, tasks)
    pool.shutdown()
    # 以下这种也可以
    # with futures.ThreadPoolExecutor(max_workers=5) as pool:
    #     pool.map(expensive_function, tasks)


if __name__ == '__main__':
    start = time.time()
    execute_with_pool()
    end = time.time()
    print(f'总时长{end - start}')

输出：

启动线程池
[PID = 11056] (Parent process is 5060) [TID = 6636] Executing with n = 0...
[PID = 11056] (Parent process is 5060) [TID = 2700] Executing with n = 1...
[PID = 11056] (Parent process is 5060) [TID = 3424] Executing with n = 2...
[PID = 11056] (Parent process is 5060) [TID = 12340] Executing with n = 3...
[PID = 11056] (Parent process is 5060) [TID = 1588] Executing with n = 4...
[PID = 11056] [TID = 3424] n = 2 Done, 休眠了1秒
[PID = 11056] (Parent process is 5060) [TID = 3424] Executing with n = 5...
[PID = 11056] [TID = 6636] n = 0 Done, 休眠了1秒
[PID = 11056] (Parent process is 5060) [TID = 6636] Executing with n = 6...
[PID = 11056] [TID = 12340] n = 3 Done, 休眠了2秒
[PID = 11056] (Parent process is 5060) [TID = 12340] Executing with n = 7...
[PID = 11056] [TID = 2700] n = 1 Done, 休眠了3秒
[PID = 11056] (Parent process is 5060) [TID = 2700] Executing with n = 8...
[PID = 11056] [TID = 12340] n = 7 Done, 休眠了1秒
[PID = 11056] (Parent process is 5060) [TID = 12340] Executing with n = 9...
[PID = 11056] [TID = 2700] n = 8 Done, 休眠了1秒
[PID = 11056] [TID = 6636] n = 6 Done, 休眠了3秒
[PID = 11056] [TID = 12340] n = 9 Done, 休眠了1秒
[PID = 11056] [TID = 1588] n = 4 Done, 休眠了5秒
[PID = 11056] [TID = 3424] n = 5 Done, 休眠了5秒
总时长6.022863864898682

可以看到，程序一开始立刻开启了5条不同的线程，这五条线程在同一条进程里，当TID=3424这条线程Done掉n=2之后，立刻开始处理n=5的task了，依此类推。
这里我们使用了futures.ThreadPoolExecutor这个暴露的API直接创建线程池，那么我们是否可以自己创建一个线程池呢？

import os
import random
import threading
import time
from concurrent import futures


def expensive_function(n: int):
    # 获取当前进程的进程ID和父进程ID以及线程ID
    print(f'[PID = {os.getpid()}] (Parent process is {os.getppid()}) [TID = {threading.get_ident()}] Executing with n = {n}...')
    sleep_time = random.randint(1, 5)
    time.sleep(sleep_time)
    print(f'[PID = {os.getpid()}] [TID = {threading.get_ident()}] n = {n} Done, 休眠了{sleep_time}秒')


def execute_with_pool():
    tasks = [i for i in range(10)]
    print('启动线程池')
    pool = futures.ThreadPoolExecutor(max_workers=5)
    pool.map(expensive_function, tasks)
    pool.shutdown()
    # 以下这种也可以
    # with futures.ThreadPoolExecutor(max_workers=5) as pool:
    #     pool.map(expensive_function, tasks)


def execute_with_raw_pool():
    tasks, raw_threads_pool = [i for i in range(10)], []
    print('启动线程池')
    for i in tasks:
        t = threading.Thread(target=expensive_function, args=(i, ))
        t.start()
        raw_threads_pool.append(t)
    for t in raw_threads_pool:
        t.join()
        
    # 这样写也可以
    # tasks = [i for i in range(10)]
    # print('启动线程池')
    # raw_threads_pool = [threading.Thread(target=expensive_function, args=(tasks[i], )) for i in range(10)]
    # for t in raw_threads_pool:
    #     t.start()
    # for t in raw_threads_pool:
    #     t.join()


if __name__ == '__main__':
    start = time.time()
    execute_with_raw_pool()
    end = time.time()
    print(f'总时长{end - start}')

输出：

启动线程池
[PID = 31400] (Parent process is 6840) [TID = 24664] Executing with n = 0...
[PID = 31400] (Parent process is 6840) [TID = 28000] Executing with n = 1...
[PID = 31400] (Parent process is 6840) [TID = 34808] Executing with n = 2...
[PID = 31400] (Parent process is 6840) [TID = 32532] Executing with n = 3...
[PID = 31400] (Parent process is 6840) [TID = 31960] Executing with n = 4...
[PID = 31400] (Parent process is 6840) [TID = 9920] Executing with n = 5...
[PID = 31400] (Parent process is 6840) [TID = 31420] Executing with n = 6...
[PID = 31400] (Parent process is 6840) [TID = 29456] Executing with n = 7...
[PID = 31400] (Parent process is 6840) [TID = 34084] Executing with n = 8...
[PID = 31400] (Parent process is 6840) [TID = 3068] Executing with n = 9...
[PID = 31400] [TID = 34808] n = 2 Done, 休眠了1秒
[PID = 31400] [TID = 31960] n = 4 Done, 休眠了1秒
[PID = 31400] [TID = 31420] n = 6 Done, 休眠了1秒
[PID = 31400] [TID = 28000] n = 1 Done, 休眠了2秒
[PID = 31400] [TID = 34084] n = 8 Done, 休眠了2秒
[PID = 31400] [TID = 32532] n = 3 Done, 休眠了3秒
[PID = 31400] [TID = 3068] n = 9 Done, 休眠了3秒
[PID = 31400] [TID = 24664] n = 0 Done, 休眠了4秒
[PID = 31400] [TID = 9920] n = 5 Done, 休眠了5秒
[PID = 31400] [TID = 29456] n = 7 Done, 休眠了5秒
总时长5.054566144943237

可以看到，程序立刻开启了10个不同的线程（在同一个进程上），由于多线程的缘故，程序总时长等于休眠最长的一个线程耗时，多出来的0.0545几几是其他代码的耗时。

4、线程间的共享内存。

进程有自己的独立地址空间，每启动一个进程，系统就会为它分配地址空间。而线程是共享进程中的数据的，使用相同的地址空间，因此线程间的通讯和共享内存相对进程来说，容易很多。

# 共享内存
def add_and_read_key(d, key):
    # 虽然在线程间的通讯和共享内存相对进程来说容易很多，但是仍然存在race condition，竞争抢占问题不能避免
    print(f'add key {key}')
    d[key] = key
    print(f'dict is {d}')  # 这里打印的结果不是完全确定的，根据系统调度线程的情况(由于race的原因)


def share_with_thread():
    a = {'name': 'threads'}
    t1 = threading.Thread(target=add_and_read_key, args=(a, 't1'))
    t2 = threading.Thread(target=add_and_read_key, args=(a, 't2'))

    t1.start()
    t2.start()

    t1.join()
    t2.join()


if __name__ == '__main__':
    start = time.time()
    share_with_thread()
    end = time.time()
    print(f'总时长{end - start}')

输出：

add key t1
dict is {'name': 'threads', 't1': 't1'}
add key t2
dict is {'name': 'threads', 't1': 't1', 't2': 't2'}
总时长0.0009992122650146484

由于race condition的存在，当代码变得复杂时，程序有可能产生我们预期之外的结果，为了避免这种情况的发生，我们来看看线程间的同步。

5、线程间的同步。

线程间的同步我们通常使用锁Lock来实现。

# 同步
# 锁 Lock
def lock_add_and_read_key(lock: threading.Lock, d, key):
    with lock:  # 加锁，保证了lock下语句块的原子性，避免了多线程的race condition；如果已经锁住，其他线程会在这里阻塞，直到该线程的锁被释放
        print(f'add key {key}')
        d[key] = key
        print(f'dict is {d}')


def access_share_with_thread_lock():
    a = {'name': 'threads'}
    lock = threading.Lock()
    t1 = threading.Thread(target=lock_add_and_read_key, args=(lock, a, 't1'))
    t2 = threading.Thread(target=lock_add_and_read_key, args=(lock, a, 't2'))

    t1.start()
    t2.start()

    t1.join()
    t2.join()


if __name__ == '__main__':
    start = time.time()
    access_share_with_thread_lock()
    end = time.time()
    print(f'总时长{end - start}')

输出：

add key t1
dict is {'name': 'threads', 't1': 't1'}
add key t2
dict is {'name': 'threads', 't1': 't1', 't2': 't2'}
总时长0.00099945068359375

6、线程间的通讯。

上面说到，线程间的通讯和共享内存相对进程来说，容易很多。我们可以直接通过线程的共享内存来进行通讯，不过在工程中，最好还是和进程间的通讯一样，使用队列Queue来进行通讯。这里我们试着加上锁来进行线程间的通讯以用来避免race condition。

def expensive_function(n: int):
    # 获取当前进程的进程ID和父进程ID以及线程ID
    print(f'[PID = {os.getpid()}] (Parent process is {os.getppid()}) [TID = {threading.get_ident()}] Executing with n = {n}...')
    sleep_time = random.randint(1, 5)
    time.sleep(sleep_time)
    print(f'[PID = {os.getpid()}] [TID = {threading.get_ident()}] n = {n} Done, 休眠了{sleep_time}秒')

def thread_worker_func(q: queue.Queue, lock: threading.Lock):
    with lock:
        arg = q.get()
        expensive_function(arg)  # 调用任务函数


def communicate_between_threads():
    q = queue.Queue()
    lock = threading.Lock()
    threads_pool = []
    for i in range(10):
        q.put(i)
        t = threading.Thread(target=thread_worker_func, args=(q, lock))
        threads_pool.append(t)
        print(f'启动线程{i}号')
        t.start()
    # 等待直到所有子线程运行结束
    for t in threads_pool:
        t.join()


if __name__ == '__main__':
    start = time.time()
    communicate_between_threads()
    end = time.time()
    print(f'总时长{end - start}')