并行计算架构和编程 | Assignment 2: Building A Task Execution Library from the Ground Up

from pixiv

PART_A

这里我将重点对TaskSystemParallelThreadPoolSleeping的实现进行讨论。

这里实际上要求我们实现线程池，在此基础上我还实现了任务队列，当然对于这题实际上并不需要用任务队列来管理动态分配，使用了任务队列执行速度会有所下降

我参考的通用性任务队列+线程池实现方式的博客为[C++] C++ 有什么好用的线程池？

实际上我感觉这个线程池实现是有问题的，主要在于使用条件变量进行notify时，若线程还未进入wait，notify后才进入wait,那么很可能导致线程因接收不到notify而一直wait下去，这个问题在接下来的条件变量中进行详细讨论。

针对Part_ATaskSystemParallelThreadPoolSleeping的优雅实现博客

通过三个变量优雅实现任务的动态分配以及线程池中线程wait和run中主线程的wait: num_done_tasks，num_left_tasks，num_total_tasks

初始化

在run函数初始阶段，初始化num_done_tasks = 0, num_total_tasks = num_total_tasks, num_left_tasks = num_total_tasks

num_total_tasks为run函数的传参

条件变量的等待与唤醒

run函数初始化num_left_tasks = num_total_tasks后，notify线程池中的线程，然后当num_done_tasks != num_total_tasks时，run函数一直wait

当num_left_tasks <= 0时, 线程池中的线程一直wait，否则完成任务taskIDnum_total_tasks - num_left_tasks, 完成后对num_left_tasks--, num_done_tasks++.

当num_done_tasks == num_total_tasks时，线程池中的线程notify run函数

我的实现

任务队列

template <typename T>
class TaskQueue {
    private:
        std::queue<T> taskQueue;
        std::mutex mutex;
    public:
        TaskQueue() {}
        ~TaskQueue() {}
        // 进队
        void enqueue(T& t) {
            std::unique_lock<std::mutex> lock(mutex);
            taskQueue.push(t);
        }
        // 出队
        bool dequeue(T &t) {
            std::unique_lock<std::mutex> lock(mutex);
            if (taskQueue.empty()) return false;
            t = std::move(taskQueue.front());
            taskQueue.pop();
            return true;
        }
        // 查空
        bool empty() {
            std::unique_lock<std::mutex> lock(mutex);
            return taskQueue.empty();
        }
        // 任务个数
        int size() {
            std::unique_lock<std::mutex> lock(mutex);
            return taskQueue.size();
        }
};

线程池

class TaskSystemParallelThreadPoolSleeping: public ITaskSystem {
    private:
        class ThreadPool {
            private:
                class ThreadWorker {
                    private:
                        int threadId;
                        ThreadPool* threadPool;
                    public:
                        ThreadWorker(int threadId, ThreadPool* threadPool): threadId(threadId), threadPool(threadPool) {}
                        ~ThreadWorker() {}
                        void operator()() {
                            bool dequeue;
                            TaskParameter taskParameter;
                            while (!threadPool->stop) {
                                std::unique_lock<std::mutex> lt_lock(threadPool->left_mutex);
                                threadPool->left_lock.wait(lt_lock, [this]() { 
                                    return threadPool->stop || !threadPool->taskQueue.empty(); });
                                dequeue = threadPool->taskQueue.dequeue(taskParameter);
                                lt_lock.unlock();
                                
                                if (dequeue) taskParameter.runnable->runTask(taskParameter.taskId, taskParameter.num_total_tasks);
                                else continue;

                                std::unique_lock<std::mutex> de_lock(threadPool->done_mutex);
                                threadPool->num_done_tasks++;
                                if (threadPool->num_done_tasks == threadPool->num_total_tasks) 
                                    threadPool->done_lock.notify_all();
                                de_lock.unlock();
                            }
                        }
                };
                bool stop; // 强制结束这一切
                TaskQueue<TaskParameter> taskQueue; // 任务队列
                std::vector<std::thread> threads; 
            public:
                int num_total_tasks;
                int num_done_tasks;
                std::mutex done_mutex; // 解决ITaskSystem的run函数为等待全部线程完成任务而spining的问题
                std::condition_variable done_lock;// 若全部线程并非完成了全部任务的条件成立则wait, 否则唤醒
                std::mutex left_mutex; // 解决线程池中线程spining的问题
                std::condition_variable left_lock; // 若任务队列为空的条件成立则wait, 否则唤醒

                ThreadPool(int num_threads): stop(false), threads(std::vector<std::thread>(num_threads)),
                                             num_total_tasks(0), num_done_tasks(0) {}
                ~ThreadPool() {}
                void initThreadPool() {
                    int i, num_threads = threads.size();
                    for (i = 0; i < num_threads; i++) threads[i] = std::thread(ThreadWorker(i, this));
                }
                void shutdownPool() {
                    stop = true;
                    left_lock.notify_all();

                    int i, num_threads = threads.size();
                    for (i = 0; i < num_threads; i++) if (threads[i].joinable()) threads[i].join();
                }
                void submitTask(TaskParameter& t) {
                    std::unique_lock<std::mutex> lock(left_mutex);
                    taskQueue.enqueue(t);
                    left_lock.notify_one();
                }
        };
    public:
        ThreadPool threadPool;
        TaskSystemParallelThreadPoolSleeping(int num_threads);
        ~TaskSystemParallelThreadPoolSleeping();
        const char* name();
        void run(IRunnable* runnable, int num_total_tasks);
        TaskID runAsyncWithDeps(IRunnable* runnable, int num_total_tasks,
                                const std::vector<TaskID>& deps);
        void sync();
};

调用实现

const char* TaskSystemParallelThreadPoolSleeping::name() {
    return "Parallel + Thread Pool + Sleep";
}

TaskSystemParallelThreadPoolSleeping::TaskSystemParallelThreadPoolSleeping(int num_threads): ITaskSystem(num_threads),
                                                                                             threadPool(num_threads) {
    threadPool.initThreadPool();
}

TaskSystemParallelThreadPoolSleeping::~TaskSystemParallelThreadPoolSleeping() {
    threadPool.shutdownPool();
}

void TaskSystemParallelThreadPoolSleeping::run(IRunnable* runnable, int num_total_tasks) {
    threadPool.num_total_tasks = num_total_tasks;
    threadPool.num_done_tasks = 0;
    int i;
    for (i = 0; i < num_total_tasks; i++) {
        TaskParameter t = {runnable, i, num_total_tasks};
        threadPool.submitTask(t);
    }

    std::unique_lock<std::mutex> lock(threadPool.done_mutex);
    threadPool.done_lock.wait(lock, [this]() { return threadPool.num_done_tasks == threadPool.num_total_tasks; });
    lock.unlock();
}

相关C++基础

C++条件变量基础

条件变量（Condition Variable）的一般用法是：线程 A 等待某个条件并挂起，直到线程 B 设置了这个条件，并通知条件变量，然后线程 A 被唤醒。

因为条件变量是由多个线程访问的，需要锁对条件变量进行互斥访问。

在wait之前，线程都是持有锁的；当wait后，线程释放锁；线程被唤醒后，继续得到锁。

条件变量被通知后，挂起的线程就被唤醒，但是唤醒也有可能是假唤醒，或者是因为超时等异常情况，所以被唤醒的线程仍要检查条件是否满足，所以 wait 是放在条件循环里面。

cv.wait(lock, [] { return ready; }); 相当于：while (!ready) { cv.wait(lock); }。

案例

std::unique_lock 是 C++ 标准库中提供的一种互斥锁包装类，用于管理互斥量（mutex）的锁定和解锁操作。它采用 RAII（Resource Acquisition Is Initialization）的方式，通过构造时获得锁、析构时自动释放锁，从而避免因为异常、逻辑分支等原因造成的死锁。

std::unique_lock 可与条件变量 std::condition_variable 协同，用于线程同步。由于条件变量操作需要能够传递一个可解锁且能重新加锁的锁对象，因此 std::unique_lock 是首选：

#include <mutex>
#include <condition_variable>
#include <chrono>

std::mutex mtx;
std::condition_variable cv;
bool ready = false;

void waiting_thread() {
    std::unique_lock<std::mutex> lock(mtx);
    // 等待条件变量通知，等待期间自动释放互斥量
    cv.wait(lock, [](){ return ready; });
    // 当条件满足后，互斥量被重新加锁，可以安全地访问共享数据
    // ...
}

void signaling_thread() {
    {
        std::unique_lock<std::mutex> lock(mtx);
        ready = true;
    } // 离开作用域时自动释放锁
    cv.notify_one(); // 通知等待线程
}

C++条件变量中wait和notify缺陷

在 C++ 标准库的条件变量中，notify_all() 只会唤醒当时已经在等待（即已经调用了 wait() 的）线程。

如果线程 A 在线程 B 调用 wait() 前就执行了 notify_all()，那么通知不会被保存下来，当线程 B 后续调用 wait() 时，它会一直阻塞等待下一个通知。

因此，在这种情况下，线程 B 不会被唤醒。

为了避免这种情况，通常会结合一个或多个共享状态和条件变量一起使用。示例代码如下：

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
bool ready = false;  // 共享状态变量

void threadA() {
    {
        std::lock_guard<std::mutex> lock(mtx);
        ready = true;  // 设置条件
    }
    cv.notify_all();   // 通知等待的线程
    std::cout << "threadA: 已经通知所有线程\n";
}

void threadB() {
    std::unique_lock<std::mutex> lock(mtx);
    // 判断 ready 状态，不满足条件才等待
    cv.wait(lock, [] { return ready; });
    std::cout << "threadB: 收到通知并继续执行\n";
}

int main() {
    std::thread a(threadA);
    // 为确保 threadB 调用 wait 的时候 ready 已经被设置，
    // 可以适当调整线程启动的顺序，不过更推荐使用共享状态来保证安全性
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    std::thread b(threadB);

    a.join();
    b.join();
    return 0;
}

cv.wait 的重载与 Lambda 作为谓词

std::condition_variable::wait 有一个重载版本接受两个参数：

第一个参数是一个 std::unique_lockstd::mutex，用于内部管理互斥锁。
第二个参数是一个谓词（predicate），即一个可以调用的函数对象，当它返回 true 时，等待便结束，线程继续执行。

cv.wait(lock, []() { return ready; });

在这里， { return ready; } 是一个 lambda 表达式，它无参数，无捕获（capture），并返回一个布尔值。cv.wait 会在内部不断调用该 lambda，当 lambda 返回 true 时，等待状态解除。

Lambda 表达式的一般形式如下：

[capture](parameters) -> return_type { body }

capture：捕获列表，指定 lambda 内部如何访问其作用域中的变量。
parameters：参数列表，类似于普通函数的参数列表。
return_type（可选）：返回值类型。如果省略，编译器会自动推断。
body：函数体，可以是多条语句或单个返回语句。

[this] 与 [] 的区别

[this] 表示 lambda 内部捕获当前对象的 this 指针。这允许 lambda 访问当前对象的所有成员（包括私有和保护成员），例如成员变量 ready。

class MyClass {
    bool ready;
    std::mutex mtx;
    std::condition_variable cv;
public:
    void waitForReady() {
        std::unique_lock<std::mutex> lock(mtx);
        // 这里 [this] 捕获当前对象，允许使用 this->ready
        cv.wait(lock, [this] { return ready; });
    }
};

当你写 cv.wait(lock, [] { return ready; }); 时，lambda 不会捕获外部变量。如果 ready 是类的成员变量而 lambda 却没有捕获 this，那么在 lambda 内部无法直接访问 ready，会导致编译错误。

[&] 与 [] 的区别

使用 [&] 表示 lambda 将所有在其作用域中使用到的外部变量都按引用捕获。如果 ready 是在外部定义的局部变量，这种方式会将它捕获进来，从而可以在 lambda 内部使用。

void func() {
    bool ready = false;
    std::mutex mtx;
    std::condition_variable cv;
    std::unique_lock<std::mutex> lock(mtx);

    // 使用空捕获列表：如果这里的 ready 是局部变量，会导致编译错误
    cv.wait(lock, [] { return ready; }); // 编译错误：无法捕获局部变量 'ready'
    
    // 使用引用捕获：可以捕获外部的局部变量 ready
    cv.wait(lock, [&] { return ready; });
}

使用 [] 的 lambda 无法直接访问局部变量 ready，因为它没有捕获任何外部变量。

而使用 [&] 的 lambda 会自动捕获所有在 lambda 内被使用到的外部变量（此处为 ready），从而可以正确返回 ready 的值。

案例

void TaskSystemParallelThreadPoolSleeping::run(IRunnable* runnable, int num_total_tasks) {
	...
	threadPool.done_lock.wait(lock, [this]() { return threadPool.num_done_tasks == threadPool.num_total_tasks; });
	...
}

lambda 表达式的 [this] 捕获的是当前对象的 this 指针，也就是调用 TaskSystemParallelThreadPoolSleeping::run 成员函数的对象。

因此，在 lambda 内部访问 threadPool 实际上等同于访问 this->threadPool，即当前 TaskSystemParallelThreadPoolSleeping 对象的成员变量。

C++ std::thread调用

class ThreadPool {
 private:
    class ThreadWorker {
     private:
        int m_id;
        ThreadPool* m_pool;

     public:
        ThreadWorker(ThreadPool* pool, const int id) : m_pool(pool), m_id(id) {}

        void operator()() {
		...
        }
    };

    std::vector<std::thread> m_threads;
 public:
    void init() {
        for (int i = 0; i < m_threads.size(); ++i) {
            m_threads[i] = std::thread(ThreadWorker(this, i));
        }
    }
}

我上面有个很奇怪的操作：std::thread(ThreadWorker(this, i));

void operator()()这是函数调用运算符重载，意思是让一个类的对象像函数一样被调用。在 C++ 中，不仅函数（函数指针）可以被调用，一个重载了函数调用运算符 operator() 的对象也可以“像函数一样被调用”。这样的对象叫做函数对象或仿函数（functor）。
std::thread 构造函数接受一个可调用对象（例如函数、lambda、或一个函数对象）作为其第一个参数，然后依次接受该可调用对象需要的参数。

PART_B

基础语法问题

std::mutex 不可拷贝，也不可移动

struct TaskNode {
    int bulk_task_id; // 任务节点ID
    int num_total_tasks; // 任务节点总任务数
    int num_done_tasks; // 完成的任务数
    int next_task_id; // 分配给线程的任务ID
    int num_indegree; // 节点的入度
    IRunnable *runnable;
    std::mutex id_mutex;
    std::mutex done_mutex;
};
std::queue<TaskNode> m_taskQueue;
TaskNode node;
xxx
m_taskQueue.push_back(node)

像如上，因为std::mutex 不可拷贝，也不可移动，当我生成node时，其中含有两个 std::mutex 成员，当m_taskQueue.push_back(node), 两个 std::mutex 成员相当于被拷贝了，这会引发编译器报错。

解决方法应该是使用指针:

struct TaskNode {
            TaskID task_id;
            IRunnable* task_ptr;
            std::vector<TaskID> deps;
            int num_total_tasks;
            int num_allocated_tasks;
            int num_done_tasks;
            std::mutex mtx;

            Task(TaskID task_id, int num_total_tasks,
                 IRunnable* task_ptr, std::vector<TaskID> deps);
        };
std::queue<TaskNode*> m_taskQueue;
auto task_ptr = new Task(task_id, num_total_tasks, runnable, deps);
m_taskQueue.push_back(task_ptr);

然后接下来对于TaskNode的拷贝和移动都要使用指针。

实现问题

我在实现时真的利用unordered_map构建了一个计算图，然后利用图遍历来判断节点的入度是否为0，然后来决定是否依赖消失，可以放到任务队列中运行。

但是我发现对于并发来讲，实现图有很多困难，因为是在并发中建图的，并非提前建好了图, 如下为我的一点感悟：

对于c/c++实现好的数据结构如map,set,unordered_map之类的一定要用锁保护好，即使是两个不相关的操作比如添加新的元素到unordered_map中和对unordered_map中已有的元素进行操作。当这些操作没有用锁保护好，让其并发执行，也会出现不可思议的错误。
并发中建图一定要考虑好当图中已有节点遍历其子节点，但是其子节点由于还没有加入图中，后面才加入图中的情况。

所以不如像这位大哥博客中的思路：

实现异步任务系统需要考虑任务调度存在依赖的问题，某些任务必须在前面某些任务执行后才能开始执行。分析任务调度的过程，可以发现任务有四种状态：

未就绪：需要等待依赖任务完成

就绪：依赖的任务都已完成，可以随时被调度执行

分配完成：已将任务的num_total_tasks项工作全部分给各线程

已完成：任务的num_total_tasks项工作全部完成

将图遍历转换为简单数据结构之间的操作--任务节点的转移，用四个数据结构分别存储上述四种状态的任务节点，分别用锁保护这四个数据结构。

posted @ 2025-04-14 17:25 次林梦叶阅读(53) 评论(0) 收藏举报

刷新页面返回顶部

次林梦叶的小屋

音无结弦之时