STL源码剖析 — 空间配置器(allocator)

前言

　　以STL的实现角度而言，第一个需要介绍的就是空间配置器，因为整个STL的操作对象都存放在容器之中。

　　你完全可以实现一个直接向硬件存取空间的allocator。

　　下面介绍的是SGI STL提供的配置器，配置的对象，是内存。（以下内容来自《STL源码剖析》）

引子

　　因为这篇写得太长，断断续续都有几天，所以先在这里整理一下思路。

首先，介绍 allocator 的标准接口，除了拥有一些基本的typedef之外，最重要的就是内存相关的 allocate 和 deallocate；构造相关的 construct 和 destroy。（两者分离）然后就是实现一个简单的配置器，没有内存管理，只是简单的malloc。
- allocate 和 deallocate 负责获取可以用的内存。
- construct调用placement new构造函数，destroy调用相应类型的析构函数 ~T()。
然后介绍了SGI的第一级和第二级配置器。定义__USE_MALLOC可以设置使用第一级配置器还是两个都用。
- 内存池保留没有被分配到free list的空间，free list维护一张可供调用的空间链表。
construct 会使用placement new构造，destroy借助traits机制判断是否为 trivial再决定下一步动作。
allocate调用refill函数，会缺省申请20个区块，一个返回，19个留在free list。refill又有三种情况。
deallocate先判断是否大于128byte，是则调用第一级配置器，否就返回给freelist。

空间配置器的标准接口

根据STL的规范，allocator的必要接口

各种typedef

1 allocator::value_type
2 allocator::pointer
3 allocator::const_pointer
4 allocator::reference
5 allocator::const_reference
6 allocator::size_type
7 allocator::difference_type
8 allocator::rebind // class rebind<U>拥有唯一成员other；是一个typedef，代表allocator<U>

默认构造函数和析构函数，因为没有数据成员，所以不需要初始化，但是必须被定义

1 allocator::allocator()
2 allocator::allocator(const allocator&)
3 template <class U> allocator::allocator(const allocator<U>&)
4 allocator::~allocator()

初始化,地址相关函数

1 // 配置空间，足以存储n个T对象，第二个参数是提示，能增进区域性
2 pointer allocator::allocate(size_type n, const void*=0)
3 
4 size_type allocator::max_size() const
5 
6 pointer allocator::address(reference x) const
7 const_pointer allocator::address(const_reference x) const

构建函数

1 void allocator::construct(pointer p, const T& x)
2 void allocator::destory(pointer p)

自己设计一个简单的空间配置器

 1 #ifndef __VIGGO__
 2 #define  __VIGGO__
 3 #include <new>        // for placement new
 4 #include <cstddef>    // for ptrdiff_t, size_t
 5 #include <cstdlib>    // for exit()
 6 #include <climits>    // for UINT_MAX
 7 #include <iostream>    // for cerr
 8 
 9 namespace VG {
10     
11     template <class T>
12     inline T* _allocate(ptrdiff_t n, T*) {
13         set_new_handler(0);
14         T* tmp = (T*)(::operator new((size_t)(n * sizeof(T))));
15         if (tmp == 0) {
16             cerr << "alloc memory error!" << endl;
17             exit(1);
18         }
19         return tmp;
20     }
21 
22     template <class T>
23     inline void _deallocate(T* p) {
24         ::operator delete(p);
25     }
26 
27     template <class T1, class T2>
28     inline void _construct(T1* p, const T2& value) {
29         new(p) T1(value);
30     }
31 
32     template <class T>
33     inline void _destroy(T* p) {
34         p->~T();
35     }
36 
37     template <class T>
38     class allocator {
39     public:
40         typedef T            value_type;
41         typedef T*            pointer;
42         typedef const T*    const_pointer;
43         typedef T&            reference;
44         typedef const T&    const_reference;
45         typedef size_t        size_type;
46         typedef ptrdiff_t    difference_type;
47 
48         template <class U>
49         struct rebind {
50             typedef allocator<U> other;
51         };
52         
53         pointer address(reference x) {return (pointer)&x;}
54         const_pointer address(const_reference x) const {
55             return (const_pointer)&x;
56         }
57 
58         pointer allocate(size_type n, const void *hint=0) {
59             return _allocate((difference_type)n, (pointer)0); // mark
60         }
61 
62         void deallocate(pointer p, size_type n) {
63             _deallocate(p);
64         }
65 
66         size_type max_size() const {return size_type(UINT_MAX / sizeof(T));}
67 
68         void construct(pointer p, const T& x) {
69             _construct(p, x);
70         }
71 
72         void destroy(pointer p) {
73             _destroy(p);
74         }
75     };
76 }
77 #endif

　　放在 vector<int, VG::allocator<int> > 中测试，可以实现简单的内存分配，但是实际上的 allocator 要比这个复杂。

SGI特殊的空间配置器

　　标准的allocator只是基层内存配置/释放行为(::operator new 和 ::operator delete)的一层薄薄的包装，并没有任何效率上的强化。

　　现在我们看看C++内存配置和释放是怎样做的：

　　new运算分两阶段（1）调用 ::operator new 配置内存；（2） 调用对象构造函数构造对象内容。

　　delete运算也分两阶段（1） 调用对象的析构函数；（2）调用 ::operator delete 释放内存。

　　为了精密分工，STL allocator决定将两阶段操作区分开来，内存配置由 alloc::allocate() 负责。内存释放操作由 alloc::deallocate()负责；对象构造由 ::construct() 负责，对象析构由 ::destroy() 负责。

构造和析构基本工具：construct() 和 destroy()

　　construct() 接受一个指针p和一个初值value，该函数的用途就是将初值设定到指针所指的空间上。C++的placement new运算子可用来完成这一任务。

　　destory()有两个版本，一是接受一个指针，直接调用该对象的析构函数即可。另外一个接受first和last，将半开范围内的所有对象析构。首先我们不知道范围有多大，万一很大，而每个对象的析构函数都无关痛痒(所谓 trivial destructor)，那么一次次调用这些无关痛痒的析构函数是一种浪费。所以我们首先判断迭代器所指对象是否为 trivial（无意义），是则什么都不用做；否则一个个调用析构。

上图为construct的实现函数

上图为destroy的实现函数

这里用到我们神奇的 __type_traits<T>，之前介绍的 traits 是 萃取返回值类型 和作为重载依据的，现在为每一个内置类型特化声明一些tag。

现在我们需要用到真和假两个标志：

示例：

空间的配置和释放：std::alloc

　　SGI的设计哲学： 1. 向 system heap 要求空间； 2. 考虑多线程状态（先略过）；3. 考虑内存不足时的应变措施；4. 考虑过多“小型区块”可能造成的内存碎片问题。

　　SGI设计了双层级配置器，第一级配置器直接使用 malloc() 和 free()，第二级配置器则视情况采用不同的策略；当配置区块超过128bytes时，交给第一级配置器。

　　整个设计究竟只开放第一级配置器，或是同时开放第二级配置，取决于__USE_MALLOC时候被定义：

1 # ifdef __USE_MALLOC
2 ...
3 typedef __malloc_alloc_template<0> malloc_alloc;
4 typedef malloc_alloc alloc; // 令alloc为第一级配置器
5 #else
6 ...
7 // 令alloc为第二级配置器
8 typedef __default_alloc_template<__NODE_ALLOCATOR_THREADS, 0>alloc;
9 #endif

　　其中__malloc_alloc_template就是第一级配置器，__default_alloc_template为第二级配置器。alloc并不接受任何template型别参数。

　　无论alloc被定义为第一级或第二级配置器，SGI还为它在包装一个接口如下，使配置器的接口能够符合STL规格：

 1 template <class T, class Alloc>
 2 class simple_alloc {
 3 public:
 4        static T *allocate(size_t n)
 5             {return 0==n? 0 : (T*)Alloc::allocate(n * sizeof(T));}
 6        static T *allocate(void)
 7             {return (T*)Alloc::allocate(sizeof(T));}
 8        static void deallocate(T *p, size_t n)
 9             {if (0 != n) Alloc::deallocate(p, n*sizeof(T));}
10        static void deallocate(T *p)
11             {Alloc::deallocate(p, sizeof(T));}

　　一二级配置器的关系，接口包装，及实际运用方式，

第一级配置器 __malloc_alloc_template

 1 #if 0
 2 #    include <new>
 3 #    define __THROW_BAD_ALLOC throw bad_alloc
 4 #elif !defined(__THROW_BAD_ALLOC)
 5 #    include <iostream>
 6 #    define __THROW_BAD_ALLOC cerr << "out of memery" << endl; exit(1);
 7 #endif
 8 
 9 // malloc-based allocator.通常比稍后介绍的 default alloc 速度慢
10 // 一般而言是thread-safe，并且对于空间的运用比较高效
11 // 以下是第一级配置器
12 // 注意，无“template型别参数”。置于“非型别参数”inst，则完全没排上用场
13 template <int inst>
14 class __malloc_alloc_template {
15 private:
16     //以下都是函数指针，所代表的函数将用来处理内存不足的情况
17     static void *oom_malloc(size_t);
18     static void *oom_realloc(void*, size_t);
19     static void (* __malloc_alloc_oom_handler)();
20 public:
21     static void * allocate(size_t n) {
22         void *result = malloc(n); // 第一级配置器直接使用malloc
23         // 无法满足需求时，改用oom_malloc
24         if (0 == result) result = oom_malloc(n);
25         return result;
26     }
27 
28     static void deallocate(void *p, size_t /* n */) {
29         free(p); // 第一级配置器直接用free()
30     }
31 
32     static void * reallocate(void *p, size_t /* old_sz */, size_t new_sz) {
33         void *result = realloc(p, new_sz);
34         if (0 == result) result = oom_realloc(p, new_sz);
35         return result;
36     }
37 
38     // 以下仿真C++的 set_handler()。换句话，你可以通过它
39     // 指定自己的 out-of-memory handler,企图释放内存
40     // 因为没有调用 new，所以不能用 set_new_handler
41     static void (* set_malloc_handler(void (*f)())) () {
42         void (*old)() = __malloc_alloc_oom_handler;
43         __malloc_alloc_oom_handler = f;
44         return old;
45     }
46 };
47 
48 // 初值为0，待定
49 template <int inst>
50 void (* __malloc_alloc_template<inst>::__malloc_alloc_oom_handler)() = 0;
51 
52 template <int inst>
53 void * __malloc_alloc_template<inst>::oom_malloc(size_t n) {
54     void (* my_malloc_handler)();
55     void *result;
56 
57     for (;;) {
58         my_malloc_handler = __malloc_alloc_oom_handler;
59         if (0 = my_malloc_handler) {__THROW_BAD_ALLOC;} // 如果没设置
60         (* my_malloc_handler)(); // 调用处理例程，企图释放内存
61         result = malloc(n);        // 再次尝试配置内存
62         if (result) return result;
63     }
64 }
65 
66 template <int inst>
67 void * __malloc_alloc_template<inst>::oom_realloc(void *p, size_t n) {
68     void (* my_malloc_handler)();
69     void *result;
70 
71     for (;;) {
72         my_malloc_handler = __malloc_alloc_oom_handler;
73         if (0 == my_malloc_handler) {__THROW_BAD_ALLOC;}
74         (*my_malloc_handler)();
75         result = realloc(p, n);
76         if (result) return result;
77     }
78 }

第二级配置器 __default_alloc_template

空间配置函数 - allocate()

1 static void * allocate(size_t n);

1. 如果 n 大于128bytes的时候，交给第一级配置器。

2. 找到 n 对应free list下的节点；如果节点不可用（=0）则调用 refill() 填充，否则调整节点指向下一个为止，直接返回可用节点。

重新填充free lists - refill()

void * refill(size_t n); //缺省取得20个节点

把大小为 n 的区块交给客户，然后剩下的19个交给对应的 free_list 管理。

内存池 - chunk_alloc()

char * chunk_alloc(size_t size, int & nobjs); // nobjs是引用，会随实际情况调整大小

申请内存分三种情况：

内存池剩余空间完全满足需求。
内存池剩余空间不能完全满足需求量，当足够供应一个（含）以上的区块。
内存池剩余空间连一个区块的大小都无法提供。

首先必须做的就是查看剩余的空间:

1 size_t bytes_left = end_free - start_free;
2 size_t total_bytes = size * nobjs;

面对第一种情况，内存空间足够的，只需要调整代表空闲内存的 start_free 指针，返回区域块就可以。

面对第二种情况，尽量分配，有多少尽量分配。这是nobjs会被逐渐减少，从默认的20到能分配出内存， nobjs = bytes_left / size。

面对第三种情况，情况有点复杂。

既然 [start_free, end_free) 之间的空间不够分配 size * nobjs 大小的空间，就先把这段空间分配给合适的 free list 节点（下一步有用）。
从 heap 上分配 两倍的所需内存+heap大小的1/16（对齐成8的倍数） 大小的内存。
- 如果heap分配都失败的话，就在 free list 中比 size 大的节点中找内存使用。
- 实在不行只能调用第一级配置器看看有咩有奇迹，oom机制。
最后调整 heap_size 和 end_free，递归调用 chunk_alloc 知道至少能分出一个区块。

空间释放函数 - deallocate()

大于128就交给第一级配置器，否则调整free list，释放内存。

完整代码

  1 enum {__ALIGN = 8};
  2 enum {__MAX_BYTES = 128};
  3 enum {_NFREELISTS = __MAX_BYTES/__ALIGN};
  4 
  5 // 以下是第二级配置器
  6 // 注意，无“template型别参数”，且第二参数完全没排上用场
  7 // 第一参数用于多线程环境下
  8 template <bool threads, int inst>
  9 class __default_alloc_template {
 10 private:
 11     // 将bytes上调至8的倍数
 12     static size_t ROUND_UP(size_t bytes) {
 13         return ((bytes) + __ALIGN-1) & ~(__ALIGN-1);
 14     }
 15     
 16     union obj { // free-lists的节点构造
 17         union obj *free_list_link;
 18         char client_data[1];
 19     };
 20 
 21     static obj *volatile free_list[_NFREELISTS];
 22     static size_t FREELIST_INDEX(size_t bytes) {
 23         return ((bytes) + (__ALIGN-1)) / (__ALIGN-1);
 24     }
 25 
 26     // 返回一个大小为n的对象，并可能加入大小为n的其他区块到free list
 27     static void *refill(size_t n);
 28     // 配置一大块空间，可容纳 nobj 个大小为“size”的区块
 29     // 如果配置 nobjs 个区块有所不便，nobjs可能会降低
 30     static char *chunk_alloc(size_t size, int &nobjs);
 31 
 32     // Chunk allocation state
 33     static char *start_free;    // 内存池起始位置，只在chunk_alloc中变化
 34     static char *end_free;        // 内存池结束为止，同上
 35     static size_t heap_size;
 36 
 37 public:
 38     static void *allocate(size_t n);
 39     static void deallocate(void *p, size_t n);
 40     static void * reallocate(void *p, size_t old_sz, size_t new_sz);
 41 };
 42 
 43 template <bool threads, int inst>
 44 char * __default_alloc_template<threads, inst>::start_free = 0;
 45 
 46 template <bool threads, int inst>
 47 char * __default_alloc_template<threads, inst>::end_free = 0;
 48 
 49 template <bool threads, int inst>
 50 size_t * __default_alloc_template<threads, inst>::heap_size = 0;
 51 
 52 template <bool threads, int inst>
 53 __default_alloc_template<threads, inst>::obj *volatile
 54     __default_alloc_template<threads, inst>::free_list[_NFREELISTS] = 
 55 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
 56 
 57 // n must > 0
 58 template<bool threads, int inst>
 59 void * __default_alloc_template<threads, inst>::allocate(size_t n) {
 60     obj * volatile * my_free_list; // 一个数组，数组元素是obj*
 61     obj * result;
 62 
 63     if (n > (size_t) __MAX_BYTES) {
 64         return malloc_alloc::allocate(n);
 65     }
 66 
 67     // 寻找16个free lists中适当的一个
 68     my_free_list = free_list + FREELIST_INDEX(n);
 69     result = *my_free_list;
 70     if (result == 0) {
 71         // 没找到可用的free list，准备重新填充free list
 72         void *r = refill(ROUND_UP(n));
 73         return r;
 74     }
 75 
 76     // 调整free list
 77     *my_free_list = result -> free_list_link;
 78     return result;
 79 }
 80 
 81 template <bool threads, int inst>
 82 void __default_alloc_template<threads, inst>::deallocate(void *p, size_t n) {
 83     obj *q = (obj*)p;
 84     obj * volatile * my_free_list;
 85 
 86     if (n > (size_t) __MAX_BYTES) {
 87         malloc_alloc::deallocate(p, n);
 88         return ;
 89     }
 90 
 91     my_free_list = free_list + FREELIST_INDEX(n);
 92     q -> free_list_link = *my_free_list;
 93     *my_free_list = q;
 94 }
 95 
 96 template <bool threads, int inst>
 97 void * __default_alloc_template<threads, inst>::refill(size_t n) {
 98     int nobjs = 20;
 99     // 调用chunk_alloc()，尝试取得nobjs个区块作为free list的新节点
100     // 注意参数nobjs是pass by reference
101     char * chunk = chunk_alloc(n, nobjs);
102     obj * volatile * my_free_link;
103     obj * result;
104     obj * current_obj, * next_obj;
105     int i;
106 
107     // 如果只获得一个区块，这个区块就分配给调用者用，free list无新节点
108     if (1 == nobjs) return chunk;
109     // 否则准备调整free link，纳入新节点
110     my_free_link = free_list + FREELIST_INDEX(n);
111 
112     // 以下是chunk空间内建立free list
113     result = (obj *)chunk;
114     // 以下引导free list指向新配置的空间（取自内存池）
115     *my_free_link = next_obj = (obj*) (chunk + n);
116     // 以下将free list的各节点串接起来
117     for (i=1; ; ++i) { // 从1开始，因为第0个将返回给客户端
118         current_obj = next_obj;
119         next_obj = (obj *)((char *)next_obj + n);
120         if (nobjs - 1 == i) {
121             current_obj -> free_list_link = 0;
122             break;
123         } else {
124             current_obj -> free_list_link = next_obj;
125         }
126     }
127     return result;
128 }
129 
130 
131 // 假设size已经上调至8的倍数
132 // 注意参数nobjs是pass by reference
133 template <bool threads, int inst>
134 char *
135     __default_alloc_template<threads, inst>::chunk_alloc(size_t size, int& nobjs) {
136         char * result;
137         size_t total_bytes = size * nobjs;
138         size_t bytes_left = end_free - start_free;
139 
140         if (bytes_left >= total_bytes) {
141             // 内存池剩余空间完全满足需求量
142             result = start_free;
143             start_free += total_bytes;
144             return result;
145         } else if (bytes_left >= size) {
146             // 内存池剩余空间不能完全满足需求量，但足够供应一个（含）以上的区块
147             nobjs = bytes_left/size;
148             total_bytes = size * nobjs;
149             result = start_free;
150             start_free += total_bytes;
151             return result;
152         } else {
153             // 内存池剩余空间连一个区块的大小都无法提供
154             size_t bytes_to_get = 2 * total_bytes + ROUND_UP(heap_size >> 4);
155             // 以下试着让内存池中的残余零头还有利用价值
156             if (bytes_left > 0) {
157                 // 内存池内还有一些零头，先配给适当的free list
158                 // 首先寻找适当的free list
159                 obj * volatile * my_free_list = free_list + FREELIST_INDEX(bytes_left);
160                 // 调整free list，将内存池中的残余空间编入
161                 ((obj *)start_free) -> free_list_link = *my_free_list;
162                 *my_free_list = (obj *)start_free;
163             }
164 
165             // 配置heap空间，用来补充内存池
166             start_free = (char *)malloc(bytes_to_get);
167             if (0 == start_free) {
168                 // heap空间不足，malloc失败
169                 int i;
170                 obj * volatile * my_free_list, *p;
171                 // 试着检视我们手上拥有的东西，这不会造成伤害。我们不打算尝试配置
172                 // 较小的区块，因为那在多进程机器上容器导致灾难
173                 // 以下搜寻适当的free list
174                 // 所谓适当是指“尚未用区块，且区块够大”的free list
175                 for (i=size; i <= __MAX_BYTES; i+=__ALIGN) {
176                     my_free_list = free_list + FREELIST_INDEX(i);
177                     p = *my_free_list;
178                     if (0 != p) { // free list内尚有未用块
179                         // 调整free list以释放未用区块
180                         *my_free_list = p -> free_list_link;
181                         start_free = (char *)p;
182                         end_free = start_free + i;
183                         // 递归调用自己，为了修正nobjs
184                         return chunk_alloc(size, nobjs);
185                         // 注意，任何残余零头终将被编入适当的free list中备用
186                     }
187                 }
188                 end_free = 0; // 如果出现意外，调用第一级配置器，看看oom机制能否尽力
189                 start_free = (char *)malloc_alloc::allocate(bytes_to_get);
190                 // 这会抛出异常 或 内存不足的情况得到改善
191             }
192             heap_size += bytes_to_get;
193             end_free = start_free + bytes_to_get;
194             // 递归调用自己，为了修正nobjs
195             return chunk_alloc(size, nobjs);
196         }
197 }

posted on 2017-04-09 18:19 郑兴鹏阅读(240) 评论(0) 收藏举报