CMU15-445 POJECT#1-buffer pool

前段时间学习了CMU15-445 2021fall的DBMS课程，最近春招将近，作为复习，将四个project的实验重新梳理一下。

上图中的buffer pool部分即为本project的主要内容。

buffer pool可以用memory mapping (mmap)将文件中的内容存储在程序的地址空间以供使用。
The DBMS can use memory mapping (mmap) to store the contents of a file into the address space of a program.

而本实验中主要是完成buffer pool manager，即对缓存池空间的管理，包括获取磁盘页放入缓存池中、删除缓存页等

1. lru replacer （LRU 页面置换器）

LRU是Least Recently Used的缩写，即最近最少使用页面置换算法，是为虚拟页式存储管理服务的，是根据页面调入内存后的使用情况进行决策了。由于无法预测各页面将来的使用情况，只能利用“最近的过去”作为“最近的将来”的近似，因此，LRU算法就是将最近最久未使用的页面予以淘汰。
提供的接口有：

Vitim: 顾名思义，选出受害者，即通过LRU算法选出一个页面（要被淘汰的）,如果无页可换，返回false；
Pin：意味着被pin的页正在cpu使用中，不能被LRU算法选中；
Unpin: 页不再被cpu使用，可以被LRU算法选中；
Size: 返回当前位于缓存中页面数；

实现细节：

使用链表记录所有位于缓存中的页，并通过FIFO的方式存取数据，然后对于被pin过的页面，会被重新放至队列尾部，这样就可以做到每次置换出的都是Least Recently Used的页面。
当每次某一页被pin时，由于要做到从页号到缓存页帧号的映射，如果全部遍历效率过低，所以通过map建立映射:
unordered_map<frame_id_t, std::list<frame_id_t>::iterator> iterator_table_

2. BUFFER POOL MANAGER INSTANCE

这一部分是本实验的主体部分，用到的数据结构有：
1.std::unordered_map<page_id_t, frame_id_t> page_table_：建立页号到页框映射，与lru replacer中使用到的页表不同的是，lru中页表只记录未被使用且存在于缓存中的页（有部分页可能正在被cpu使用），即可被替换的页，而这里的页表会记录所有位于缓存中的页；
2. std::list<frame_id_t> free_list_：空闲列表，记录了所有未存储磁盘数据的缓存页；
3. Page *pages_：Array of buffer pool pages.
4. Replacer *replacer_：上一节所实现的置换器；
5. LogManager *log_manager_: 磁盘管理器，用于取出磁盘中对应的页；
6. std::mutex latch_：互斥锁。
主要实现的接口有：

FetchPgImp(page_id)：外界想通过bpm获取page_id对应页面时，通过这个接口进行，如果page_id已经在缓存中，直接返回该页面指针。如果不在缓存中，则首先需要判断是否有空闲空间（通过free_list_），如果有就从磁盘取出页放入缓存中。否则需要通过lru replacer替换一页出去，替换时需要判断该页是否为脏页，如果为脏页，需要将数据写回磁盘。之后将新页写入该页框处。

UnpinPgImp(page_id, is_dirty)：意味着使用该页的程序减一（某cpu不使用该页了），缓存中每一页都有pin_count来记录当前使用该页面的程序数，如果pin_count为0，意味着该页面可以被置换出去了。

FlushPgImp(page_id): 将缓存中某一页面写回入磁盘中，并重置dirty字段。

NewPgImp(page_id)：新建一页（新创建），获取一个未使用过的页号。如果缓存中已经有空闲页框，则将页面放在此处即可，否则需要置换出去一页再将该页框作为新页在缓存中的位置。

DeletePgImp(page_id): 将一页从缓存中删除（以及磁盘上的数据），如果pin_count不为0，即当前该页还在被其他程序使用中，则无法被删除。

FlushAllPagesImpl(): 将缓存中的所有数据写回磁盘中。

3. PARALLEL BUFFER POOL MANAGER

其实之一部分内容不多，主要封装了上一步实现的bmp，使得系统可以同时存在多个bmp。主要值得注意的是，在新建页的时候，页号的分配。
对于bmps[i]，他的页号page_id应该满足：page_id %n == i（n为系统的bmp个数）。

另外，由于每个bmp内部的操作是互相不能干扰的，且需要具有原子性，所以每个操作中都是用互斥锁以确保操作是原子的。
为了方便，我在这里使用的是std::lock_guard<std::mutex> l_g(mtx_);(std::mutex mtx_;)
以前对于C++并发编程和C++11并发相关内容没怎么了解过，这次也乘机学习一波。

To go with the various mutex types, the C++ Standard defines a triplet of class templates for objects that hold a lock. These are:
std::lock_guard<>,
std::unique_lock<> and
std::shared_lock<>.
For basic operations, they all acquire the lock in the constructor, and release it in the destructor, though they can be used in more complex ways if desired.
std::lock_guard<> is the simplest type, and just holds a lock across a critical section in a single block:

std::mutex m;
void f(){
    std::lock_guard<std::mutex> guard(m);
    // do stuff
}

std::unique_lock<> is similar, except it can be returned from a function without releasing the lock, and can have the lock released before the destructor:

std::mutex m;
std::unique_lock<std::mutex> f(){
    std::unique_lock<std::mutex> guard(m);
    // do stuff
    return std::move(guard);
}
void g(){
    std::unique_lock<std::mutex> guard(f());
    // do more stuff
    guard.unlock();
}

See my previous blog post for more about std::unique_lock<> and std::lock_guard<>.
std::shared_lock<> is almost identical to std::unique_lock<> except that it acquires a shared lock on the mutex. If you are using a std::shared_timed_mutex then you can use std::lock_guard<std::shared_timed_mutex> or std::unique_lock<std::shared_timed_mutex> for the exclusive lock, and std::shared_lock<std::shared_timed_mutex> for the shared lock.

std::shared_timed_mutex m;
void reader(){
    std::shared_lock<std::shared_timed_mutex> guard(m);
    // do read-only stuff
}
void writer(){
    std::lock_guard<std::shared_timed_mutex> guard(m);
    // update shared data
}

嘿，既然都到这儿了，顺便回忆（新学）一波智能指针。（放下一篇吧）

posted @ 2022-03-01 21:59 fwx 阅读(232) 评论(0) 收藏举报

刷新页面返回顶部

fwx

CMU15-445 POJECT#1-buffer pool

1. lru replacer （LRU 页面置换器）

2. BUFFER POOL MANAGER INSTANCE

3. PARALLEL BUFFER POOL MANAGER

公告