开博第一篇,学习下glibc的malloc实现,先记录下源码。
一、glibc的malloc移植了ptmalloc,ptmalloc封装了dlmalloc,关系如下:
dlmalloc(通用但并发不安全)--> ptmalloc(多线程优化)-->glibc移植(但是代码改动大)。
所以从本篇先从ptmalloc入手,主要分析dlmalloc的内存分配、管理的思想,之后分析ptmalloc的优化方法,最后剖析glibc中malloc源码,分析malloc的移植实现。
ptmalloc源码网址:http://www.malloc.de/en/, 其中malloc.c即dlmalloc实现,ptmalloc.c为dlmalloc的多线程封装.目前ptmalloc最新的版本为2006年5月31号的版本ptmalloc3,但是目前用的最多的glibc还是用的ptmalloc2,因为多线程处理中,ptmalloc2效率略高于ptmalloc3。本文章基于ptmalloc3剖析。
二、ptmalloc3
1、ptmalloc源码目录结构中malloc.c即dlmalloc实现,首先我们先分析dlmalloc实现,通过阅读malloc.c文件头注释可以帮我们更快速的理解它。摘录了主要的如下:
* Vital statistics:
Supported pointer/size_t representation: 4 or 8 bytes
size_t MUST be an unsigned type of the same width as pointers. (If you are using an ancient system that declares size_t as a signed type, or need it to be a different width than pointers, you can use a previous release of this malloc (e.g. 2.7.2) supporting these.)
支持指针和size_t形式:4或8字节
size_t必须是无符号类型,且位宽和指针相同。(如果你使用有符号size_t的老系统,或者size_t和指针位宽不一致,可以使用malloc的历史版本版本例如2.7.2支持)。
Alignment: 8 bytes (default)
This suffices for nearly all current machines and C compilers. However, you can define MALLOC_ALIGNMENT to be wider than this if necessary (up to 128bytes), at the expense of using more space.
对齐:默认8字节
它几乎支持当前所有的机器和c编译器。除此之外,如果需要,你可以通过定义MALLOC_ALIGNMENT宏,更改为更大的对齐位宽(最大128字节),代价是会占用更大的内存空间。
Minimum overhead per allocated chunk: 4 or 8 bytes (if 4byte sizes)
8 or 16 bytes (if 8byte sizes)
Each malloced chunk has a hidden word of overhead holding size and status information, and additional cross-check word if FOOTERS is defined.
每个chunk块的最小前缀(overhead): 4或8字节(当size_t为4字节)
8或16字节(当size_t为8字节)
每个申请的chunk有一个隐藏的前缀字,用来报错块大小和状态信息,如果定义了FOOTERS,还会额外多占用一个交叉检查字。
Minimum allocated size: 4-byte ptrs: 16 bytes (including overhead)
8-byte ptrs: 32 bytes (including overhead)
Even a request for zero bytes (i.e., malloc(0)) returns apointer to something of the minimum allocatable size.The maximum overhead wastage (i.e., number of extra bytes allocated than were requested in malloc) is less than or equal to the minimum size, except for requests >= mmap_threshold that are serviced via mmap(), where the worst case wastage is about 32 bytes plus the remainder from a system page (the minima mmap unit); typically 4096 or 8192 bytes.
最小申请内存大小(包括前缀):32bit地址:16字节
64bit地址:32字节
即使申请0字节(malloc(0)),实际也会返回一个最小内存指针。最大的前缀浪费(在malloc中分配的超出请求的字节数)情况是申请少于或等于最小值。除此之外,当申请内存大与mmap_threshold,将会通过mmap()函数分配内存,这样最大的浪费情况是32字节(mini)加剩余系统页(最小的mmap单元,典型大小为4096字节或8192字节)。
Security: static-safe; optionally more or less
The "security" of malloc refers to the ability of malicious code to accentuate the effects of errors (for example, freeing space that is not currently malloc'ed or overwriting past the ends of chunks) in code that calls malloc. This malloc guarantees not to modify any memory locations below the base of heap, i.e., static variables, even in the presence of usage errors. The routines additionally detect most improper frees and reallocs. All this holds as long as the static bookkeeping
for malloc itself is not corrupted by some other means. This is only one aspect of security -- these checks do not, and cannot, detect all possible programming errors.
If FOOTERS is defined nonzero, then each allocated chunk carries an additional check word to verify that it was malloced from its space. These check words are the same within each execution of a program using malloc, but differ across executions, so externally crafted fake chunks cannot be freed. This improves security by rejecting frees/reallocs that could corrupt heap memory, in addition to the checks preventing writes to statics that are always on. This may further improve security at the expense of time and space overhead. (Note that FOOTERS may also be worth using with MSPACES.)
By default detected errors cause the program to abort (calling "abort()"). You can override this to instead proceed past errors by defining PROCEED_ON_ERROR. In this case, a bad free has no effect, and a malloc that encounters a bad address caused by user overwrites will ignore the bad address by dropping pointers and indices to all known memory. This may be appropriate for programs that should continue if at all possible in the face of programming errors, although they may
run out of memory because dropped memory is never reclaimed.
If you don't like either of these options, you can define CORRUPTION_ERROR_ACTION and USAGE_ERROR_ACTION to do anything else. And if if you are sure that your program using malloc has no errors or vulnerabilities, you can define INSECURE to 1, which might (or might not) provide a small performance improvement.
安全性:静态安全;可自行选择
Malloc的“安全性”是指代码中调用malloc的恶意代码造成的错误影响能力。(例如,释放的内存不是当前申请的或者写溢出了当前的chunk尾部)。这个malloc保证不会修改堆底部以下的任何内存位置,例如静态变量,即使存在使用错误。例程另外还检测大多数不正确的释放和重分配。只要malloc本身的静态标记没有被其他程序破坏,所有这些都是成立的。这只是安全性的一个方面——这些检查不会也不能检测所有可能的编程错误。
如果FOOTERS被定义为非0,每个分配的chunk块携带一个附加的检查字,以验证它是否被Malloc。这些检查词在使用malloc的程序的每次执行中都是相同的,但是在不同的执行中是不同的,因此不能释放外部构造的chunks。这可以通过拒绝可能损坏堆内存的释放/重新分配来提高安全性,此外还可以防止写入始终处于打开状态的静态数据。这可能会以时间和空间开销为代价进一步提高安全性。(注意,FOOTERS可能也值得与MSPACES一起使用。)
检测到默认错误的默认动作是调用abort(调用“abort()”)。你可以定义PROCEED_ON_ERROR来覆盖它继续处理错误。在这种情况下,错误的free没有作用,而malloc遇到由用户覆盖引起的错误地址时,会忽略错误地址,通过将指针和索引丢到所有已知内存中。这可能适用于那些在可能出现编程错误的情况下应该继续运行的程序,尽管它们可能会耗尽内存,因为丢弃的内存永远不会被回收。
如果你不喜欢这些选项,你可以定义CORRUPTION_ERROR_ACTION和USAGE_ERROR_ACTION去做另外的任何事。如果你确信你的程序使用malloc没有错误或漏洞,你可以将INSECURE置1,这样可能会(也可能不会)有一个小的性能提高。
Thread-safety: NOT thread-safe unless USE_LOCKS defined
When USE_LOCKS is defined, each public call to malloc, free, etc is surrounded with either a pthread mutex or a win32 spinlock (depending on WIN32). This is not especially fast, and can be a major bottleneck. It is designed only to provide minimal protection in concurrent environments, and to provide a basis for extensions. If you are using malloc in a concurrent program, consider instead using nedmalloc(http://www.nedprod.com/programs/portable/nedmalloc/) or ptmalloc (Seehttp://www.malloc.de), which are derived from versions of this malloc.
线程安全:不是线程安全的,除非定义USE_LOCKS
当USE_LOCKS被定义是,每个对malloc、free等的公共调用都被pthread互斥锁或win32自旋锁包住(取决于win32)。这不是特别快的,会成为一个主要的瓶颈。在并发环境中,它设计只提供最少的保护,为扩展的提供基础版本。如果你在并发环境中使用malloc,考虑使用nedmalloc或者ptmalloc,它们是从这个malloc衍生出来的版本。
System requirements: Any combination of MORECORE and/or MMAP/MUNMAP
This malloc can use unix sbrk or any emulation (invoked using the CALL_MORECORE macro) and/or mmap/munmap or any emulation (invoked using CALL_MMAP/CALL_MUNMAP) to get and release system memory. On most unix systems, it tends to work best if both MORECORE and MMAP are enabled. On Win32, it uses emulations based on VirtualAlloc. It also uses common C library functions like memset. Compliance: I believe it is compliant with the Single Unix Specification(See http://www.unix.org). Also SVID/XPG, ANSI C, and probably others as well.
* Overview of algorithms
This is not the fastest, most space-conserving, most portable, or most tunable malloc ever written. However it is among the fastest while also being among the most space-conserving, portable and tunable. Consistent balance across these factors results in a good general-purpose allocator for malloc-intensive programs.
In most ways, this malloc is a best-fit allocator. Generally, it chooses the best-fitting existing chunk for a request, with ties broken in approximately least-recently-used order. (This strategy normally maintains low fragmentation.) However, for requests less than 256bytes, it deviates from best-fit when there is not an
exactly fitting available chunk by preferring to use space adjacent to that used for the previous small request, as well as by breaking ties in approximately most-recently-used order. (These enhance locality of series of small allocations.) And for very large requests (>= 256Kb by default), it relies on system memory mapping
facilities, if supported. (This helps avoid carrying around and possibly fragmenting memory used only for large chunks.)
All operations (except malloc_stats and mallinfo) have execution times that are bounded by a constant factor of the number of bits in a size_t, not counting any clearing in calloc or copying in realloc,or actions surrounding MORECORE and MMAP that have times proportional to the number of non-contiguous regions returned by system allocation routines, which is often just 1. In real-time applications, you can optionally suppress segment traversals using NO_SEGMENT_TRAVERSAL, which assures bounded execution even when system allocators return non-contiguous spaces, at the typical expense of carrying around more memory and increased fragmentation. The implementation is not very modular and seriously overuses macros. Perhaps someday all C compilers will do as good a job inlining modular code as can now be done by brute-force expansion, but now, enough of them seem not to.Some compilers issue a lot of warnings about code that is dead/unreachable only on some platforms, and also about intentional uses of negation on unsigned types. All known cases of each can be ignored.For a longer but out of date high-level description, see http://gee.cs.oswego.edu/dl/html/malloc.html
算法简介:
这不是最快的、最节省空间的、最可移植的或最可调的malloc。然而,它是速度最快的,同时也是最节省空间、可移植和可调的。在这些因素之间保持一致的平衡可以为malloc密集型程序提供一个良好的通用分配程序。在大多数情况下,这个malloc是最适合的分配器。通常,它会为请求选择最合适的现有chunk,并以最近最少使用的顺序断开连接。(这种策略通常保持低碎片化。)但是,对于小于256bytes的请求,当没有完全合适的可用块时,它就偏离了最佳匹配,因为它倾向于使用与前一个小请求相邻的空间,并且打破了最近使用的顺序。(这增强了一系列小型内存申请的局部性。)对于非常大的请求(缺省情况下>= 256Kb),如果支持,它依赖于系统内存映射工具。(这有助于避免携带仅用于大块的内存,并可能使内存碎片化。)
//略
该实现不是很模块化,并且严重滥用了宏。也许有一天,所有的C编译器都能像现在通过蛮力扩展所能做的那样,很好地内联模块化代码。一些编译器会发出大量警告,指出代码只有在某些平台上是死的/不可到达的,还会故意在无符号类型上使用求反。每种情况的所有已知情况都可以忽略。
浙公网安备 33010602011771号