MIT6.S081 ---- Lab Multithreading

Lab Multithreading

Uthread: switching between threads

本题为用户级线程系统设计上下文切换机制，并实现这个机制。uthread.c 含有大多数用户级线程包，以及一些简单的测试线程。需要完善线程包中的创建和切换相关代码。

提出一个创建线程和保存/恢复寄存器切换线程的方案，实现这个方案。
完成后，用 make grade 测试。

需要完善 user/thread.c 中的 thread_create() 和 user/thread_switch.S 中的 thread_schedule() 。
一个目标是：确保当 thread_schedule() 首次运行一个给定的线程，线程在自己的栈上执行传给 thread_create() 的函数。
另一个目标是：确保 thread_switch 保存被切换的线程的寄存器，恢复切换到的线程的寄存器，返回后一个线程的指令。
必须确定要保存/恢复哪些寄存器；修改 struct thread 保存寄存器是一个好的计划。
需要在 thread_schedule 中添加一个 thread_switch 调用；可以向 thread_switch 传递任何需要的参数，但目的是从线程 t 切换到线程 next_thread。

线程切换关键点在栈和寄存器的保存和恢复。

增加 context 结构体声明

struct context {
  uint64 ra;
  uint64 sp;

  // callee-saved
  uint64 s0;
  uint64 s1;
  uint64 s2;
  uint64 s3;
  uint64 s4;
  uint64 s5;
  uint64 s6;
  uint64 s7;
  uint64 s8;
  uint64 s9;
  uint64 s10;
  uint64 s11;
};

线程切换

thread_switch:
    sd ra, 0(a0)
    sd sp, 8(a0)
    sd s0, 16(a0)
    sd s1, 24(a0)
    sd s2, 32(a0)
    sd s3, 40(a0)
    sd s4, 48(a0)
    sd s5, 56(a0)
    sd s6, 64(a0)
    sd s7, 72(a0)
    sd s8, 80(a0)
    sd s9, 88(a0)
    sd s10, 96(a0)
    sd s11, 104(a0)

    ld ra, 0(a1)
    ld sp, 8(a1)
    ld s0, 16(a1)
    ld s1, 24(a1)
    ld s2, 32(a1)
    ld s3, 40(a1)
    ld s4, 48(a1)
    ld s5, 56(a1)
    ld s6, 64(a1)
    ld s7, 72(a1)
    ld s8, 80(a1)
    ld s9, 88(a1)
    ld s10, 96(a1)
    ld s11, 104(a1)
    ret    /* return to ra */

线程创建：设置 $ra 为切换后开始运行的指令地址。$sp 为栈地址。

t->context.ra = (uint64)func;
t->context.sp = (uint64)(t->stack + STACK_SIZE - 1);

这个实现用户级线程无法利用多核处理器，所有线程理论上只能在一个核心上交替运行。对应关系是多个用户级线程对应一个内核级线程。

Using threads

本实验将使用 hash table 探索线程和锁的并行编程。需要在真正的 Linux 或 MacOS 多核计算机（不是 xv6，不是 qemu）上完成这个实验。

本实验使用 Unix pthread 线程库。可以通过 man pthreads 找到相关手册。或通过下列网站，here，here，here

文件 notxv6/ph.c 含有一个 hash table，单线程使用正确，但多线程使用不正确。在 xv6 主目录下，键入：

$ make ph
$ ./ph 1

ph 程序运行的参数是对 hash table 执行 put 和 get 操作的线程的数量。ph 1 运行的结果与下列类似：

100000 puts, 3.991 seconds, 25056 puts/second
0: 0 keys missing
100000 gets, 3.981 seconds, 25118 gets/second

本地运行结果与样例结果相差两倍甚至更多，这取决于计算机运行速度，计算机核心数量，其他任务是否繁忙。

ph 运行两个基准。第一：通过调用 put() add 大量的 keys 到 hash table，打印出每秒 put 的次数。第二：通过 get() 从 hash table 中取出 keys，打印因 put() 出现在 hash table 中但丢失的 keys 的数量（本例为 0），打印出每秒能达到的 get() 数量。

用多线程操作 hash table 可以尝试 ph 2：

$ ./ph 2
100000 puts, 1.885 seconds, 53044 puts/second
1: 16579 keys missing
0: 16579 keys missing
200000 gets, 4.322 seconds, 46274 gets/second

ph 2 表示两个线程并发向 hash table 中添加表项，理论上速率可以达到 ph 1 的两倍，获得良好的并行加速（parallel speedup）。

但是，两行 16579 keys missing 表明很多 keys 在 hash table 中不存在，put() 应该将这些 keys 加入了 hash table，但是有些地方出错了。需要关注 notxv6/ph.c 的 put() 和 insert()。

为什么两个线程会丢失 keys，但是一个线程不会？确定一种两个线程的执行序列，可以使得 key 丢失。提交在 answers-thread.txt 中。
answers-thread.txt

为了避免这种情况，需要在 notxv6/ph.c 中的 put() 和 get() 中添加 lock 和 unlock 语句，使得两个线程丢失的 keys 数量为 $0$。
相关的线程函数如下：
pthread_mutex_t lock;            // declare a lock
pthread_mutex_init(&lock, NULL); // initialize the lock
pthread_mutex_lock(&lock);       // acquire lock
pthread_mutex_unlock(&lock);     // release lock
完成后用 make grade 测试。

内存中没有交集的并发读写操作不需要锁相互制约，利用这个特性提高并发加速。
提示：每个 hash bucket 一个锁。

在 put() 和 get() 中的不变量被破坏的地方增加 pthread_mutex_lock 和 pthread_mutex_unlock 以便保护不变量。

Barrier

本实验，实现一个 barrier：当一个线程到这个点后，必须等待其余所有线程都到达这点。使用 pthread 条件变量，类似于 xv6 的 sleep/wakeup 的序列协调技术。

本实验应在真正的计算机上完成（不是 xv6，不是 qemu）

文件 notxv6/barrier.c 含有一个不完整的 barrier。

$ make barrier
$ ./barrier 2
barrier: notxv6/barrier.c:42: thread: Assertion `i == t' failed.

$2$ 表示在 barrier 上同步线程的数量（是 barrier.c 中的 nthread）。每个线程运行一个循环。循环的每次迭代调用 barrier()，然后睡眠一段随即时间。当一个线程在另一个线程到达 barrier 之前就越过了 barrier，则 assert 触发。理想的情况是每个线程都阻塞在 barrier()，直到 nthreads 个线程都调用 barrier()。

本实验应完成理想的 barrier 行为。除了上个实验的锁原语，还需要新的 pthread 原语（here，here）。
pthread_cond_wait(&cond, &mutex); // go to sleep on cond, releasing lock mutex, acquiring upon wake up
pthread_cond_broadcast(&cond); // wake up every thread sleeping on cond

调用 pthread_cond_wait 时释放 mutex，返回之前重新获得 mutex。

已经给出了 barrier_init()，需要实现 barrier()，使不发生 panic，struct barrier 已定义方便使用。

有两个问题使得实验复杂化：

必须处理一连串的 barrier 调用，称每次调用为一次 round。bstate.round 记录当前的 round。每次当所有的线程到达 barrier 之后，将 bstate.round 加一。
必须处理一种情况：在其他线程退出屏障之前，一个线程进入循环。特别是，从一个 round 到另一个 round，正在重新使用 bstate.nthread。确保之前的 round 正在使用时，一个线程离开 barrier，再进入循环不会增加 bstate.nthread。（这个问题的目的和 xv6 的 proc->lock 有异曲同工之妙）

static void
barrier()
{
  pthread_mutex_lock(&bstate.barrier_mutex);
  bstate.nthread++;

  if (bstate.nthread < nthread) {
    pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex); // sleep on bstate.barrier_cond,release bstate.barrier_mutex.
  } else {
    bstate.nthread = 0;
    bstate.round++;
    pthread_cond_broadcast(&bstate.barrier_cond); // wake up other threads
  }

  pthread_mutex_unlock(&bstate.barrier_mutex);
}

Code

Code: Lab thread

posted @ 2022-02-19 21:37 seaupnice 阅读(234) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Loading