C++ Concurrency In Action 笔记(一) - 细粒度锁

参考：

C++ Concurrency In Action 2rd 第6章

实验环境：

system: centos 8.1 / arch: x86_64 / kernel: 4.18.0 / g++: 8.5.0

1. 概述

下文主要以异步队列为例进行讲解。
注意，为了便于理解，所以下文的代码与原文比，省去了智能指针和考虑异常的部分，代码更加简短。

2. 临界区与锁

2.1 粗粒度锁

在多线程环境下，常常会出现多个线程并发读写同一块内存区域/同一段逻辑代码的情况，这称为临界区。
临界区会存在 data race(数据竞争)，因为 data race 的存在，往往会导致程序发生 undefined behavior(未定义行为)。
为了让多个线程有序访问临界区，出现了锁的概念。持有锁的线程拥有临界区的独家访问权，其它线程想要访问临界区，必须等待锁释放。
考虑如下版本的异步队列代码：

template <class T>
class MyQueue {
public: 
  void push(T val) {
    std::lock_guard<std::mutex> queue_lock(queue_mutex);
    queue.push(std::move(val));
  }
  bool pop(T& val) {
    std::lock_guard<std::mutex> queue_lock(queue_mutex);
    if (queue.empty()) {
      return false;
    }
    val = std::move(queue.front());
    queue.pop();
    return true;
  }
private:
  std::queue<T> queue;
  std::mutex queue_mutex;
};

如上整个队列使用了一个唯一的互斥锁来锁住对 std::queue 对象的唯一访问，在一个线程进行 push 的时候，其它线程不能 push，也不能 pop，只能等待 push 完成。这种锁可以称为粗粒度锁。

2.2 粗粒度锁的优化

粗粒度锁使得两个线程：一个线程 push，一个线程 pop 不能并发执行。如果能优化一下，使得 push 和 pop 能并发执行，这样就能提高并发能力，pop 和 push 线程不用互相等待。
考虑上面的代码，使用粗粒度锁的一个重要原因就是我们借助了 std::queue 来作为底层容器，所以并发访问 std::queue 必须加上唯一锁。
为了优化异步队列，我们首先需要了解 std::queue 的内部结构。一种简单有效的实现方法就是 linked-list 即链表，push 和 pop 一般都是操作链表的 head 和 tail 指针(尾部插入，头部取出)：

  head                        tail
   |                           |
   V                           V
+------+      +------+      +------+
+ node + ---> + node + ---> + node +
+------+      +------+      +------+

分离出了 tail 和 head 指针，那么似乎我们 push 和 pop 就不用再访问同一个数据对象 std::queue 了(不严谨，见后文分析)。
进一步，我们可以对 head 和 tail 分别施加独立的锁，以让 push 和 pop 能并发执行，这种分离的锁，我们可以称为细粒度锁。

3. 细粒度锁

3.1 虚拟节点

考虑如下单线程版本的、链表实现的队列(尾部插入，头部取出)：

template <class T>
class MyQueue {
public:
  MyQueue(): head(nullptr), tail(nullptr) {}
  bool pop(T& val) {
    if (!head) {
      return false;
    }
    val = std::move(head->val);
    if (head == tail) {
      tail = nullptr;
    }
    Node* old_head = head;
    head = head->next;
    delete old_head;
    return true;
  }
  void push(T val) {
    Node* node = new Node(val);
    if (tail) {
      tail->next = node;
    } else {
      head = node;
    }
    tail = node;
  }
private:
  struct Node {
    Node(T& _val): val(std::move(_val)), next(nullptr) {}
    T val;
    Node* next;
  };
  Node* head;
  Node* tail;
};

如上，在 push 和 pop 函数中，都需要访问 head 和 tail 指针(为了处理空链表的情况)，那么如果写多线程版本，push 和 pop 函数都需要把 head mutex 和 tail mutex 两把锁都加上，这样就跟粗粒度锁一样只允许一个操作排他执行了。
为此，考虑增加一个虚拟节点：

template <class T>
class MyQueue {
private:
  struct Node {
    Node(T _val): val(std::move(_val)), next(nullptr) {}
    Node(): next(nullptr) {}
    T val;
    Node* next;
  };
public:
  MyQueue(): head(new Node()), tail(head) {}
  Node* get_tail() {
    std::lock_guard<std::mutex> tail_lock(tail_mutex);
    return tail;
  }
  bool pop(T& val) {
    std::lock_guard<std::mutex> head_lock(head_mutex);
    if (head == get_tail()) {
      return false;
    }
    val = std::move(head->val);
    Node* old_head = head;
    head = head->next;
    delete old_head;
    return true;
  }
  void push(T val) {
    Node* dummy = new Node(val);
    std::lock_guard<std::mutex> tail_lock(tail_mutex);
    tail->val = std::move(dummy->val);
    tail->next = dummy;
    tail = dummy;
  }
private:
  Node* head;
  Node* tail;
  std::mutex head_mutex;
  std::mutex tail_mutex;
};

包含虚拟节点的队列结构如下：

MyQueue 构造时，head 和 tail 都指向相同的 dummy 节点：
     head     tail
      |        |
      V        V
   +-------------+
   +    dummy    +
   +-------------+

tail 始终指向 dummy 节点，head 始终指向最新添加的元素(如下添加两个元素后)：
    head                         tail
     |                            |
     V                            V
  +------+      +------+      +-------+
  + node + ---> + node + ---> + dummy +
  +------+      +------+      +-------+

如上，通过增加虚拟节点后：

push 函数不再访问 head 指针(不需要特殊处理空链表的情况)，所以不用使用 head mutex，只需要使用 tail mutex
pop 函数这里也做了优化，只会在 get_tail() 函数中短暂持有 tail mutex
这样，push 和 pop 只会同时竞争 tail mutex，但是考虑到 pop 中只会短暂持有 tail mutex，所以可以认为 push 和 pop 能同时执行

3.2 pop 函数加锁顺序的考量

注意，pop 函数中，先对 head_mutex 上锁，再对 tail_mutex 上锁非常重要，如果是如下实现：

  bool pop(T& val) {
    const Node* old_tail = get_tail();
    std::lock_guard<std::mutex> head_lock(head_mutex);
    if (head == old_tail) {
      return false;
    }
    val = std::move(head->val);
    Node* old_head = head;
    head = head->next;
    delete old_head;
    return true;
  }

那么可能会出现如下时序：

A B 线程同时调用 pop，C 线程同时调用 push，他们都将先竞争 tail_mutex 锁
A 线程先获得 tail_mutex 锁，并成功拿到 tail 指针，然后线程挂起
C 线程随即获得 tail_mutex 锁，并成功完成后续更新 tail 指针的一系列工作
B 线程随即获得 tail_mutex 锁，并成功拿到 C 更新后的 tail 指针
B 线程率先获得 head_muetx 锁，即先于 A 线程更新 head 指针
A 线程这时终于被重新调度，并获得 head_muetx 锁，这个时候，if 语句判断为 false，并在下面错误的更新 head 指针

3.3 pop_wait 的实现

带有阻塞等待的版本如下：

template <class T>
class MyQueue {
private:
  struct Node {
    Node(T _val): val(std::move(_val)), next(nullptr) {}
    Node(): next(nullptr) {}
    T val;
    Node* next;
  };
public:
  MyQueue(): head(new Node()), tail(head) {}
  Node* get_tail() {
    std::lock_guard<std::mutex> tail_lock(tail_mutex);
    return tail;
  }
  bool pop(T& val) {
    std::lock_guard<std::mutex> head_lock(head_mutex);
    if (head == get_tail()) {
      return false;
    }
    val = std::move(head->val);
    Node* old_head = head;
    head = head->next;
    delete old_head;
    return true;
  }
  bool wait_pop(T& val) {
    std::unique_lock<std::mutex> head_lock(head_mutex);
    cv.wait(head_lock, [&]()->bool {
        return head != get_tail();
    });
    val = std::move(head->val);
    Node* old_head = head;
    head = head->next;
    delete old_head;
    return true;
  }
  void push(T val) {
    Node* dummy = new Node(val);
    std::lock_guard<std::mutex> tail_lock(tail_mutex);
    tail->val = std::move(dummy->val);
    tail->next = dummy;
    tail = dummy;
    cv.notify_one();
  }
private:
  Node* head;
  Node* tail;
  std::mutex head_mutex;
  std::mutex tail_mutex;
  std::condition_variable cv;
};

4. 总结

细粒度锁的实现，需要深入到数据结构的底层，分离访问不同内存区域，减少临界区范围，提升并发度。

posted @ 2022-07-21 17:46 重返科韵路阅读(525) 评论(0) 收藏举报

刷新页面返回顶部

重返科韵路