野(wild)指针与悬空(dangling)指针

1. 什么是野指针(wild pointer)?

A pointer in c which has not been initialized is known as wild pointer.

野指针(wild pointer)就是没有被初始化过的指针。例如,

o foo1.c

1 int main(int argc, char *argv[])
2 {
3     int *p;
4     return (*p & 0x7f); /* XXX: p is a wild pointer */
5 }

如果用"gcc -Wall"编译, 会出现如下警告:

1 $ gcc -Wall -g -m32 -o foo foo.c
2 foo.c: In function ‘main’:
3 foo.c:4:10: warning: ‘p’ is used uninitialized in this function [-Wuninitialized]
4   return (*p & 0x7f); /* XXX: p is a wild pointer */
5           ^

2. 什么是悬空指针(dangling pointer)?

If a pointer still references the original memory after it has been freed, it is called a dangling pointer.

悬空指针是指针最初指向的内存已经被释放了的一种指针。 典型的悬空指针看起来是这样的,(图片来源是这里

如果两个指针(p1和p2)指向同一块内存区域, 那么free(p1)后,p1和p2都成为悬空指针。如果进一步将p1设置为NULL, 那么p2还是悬空指针。诚然,使用*p1会导致非法内存访问,但是使用*p2却会出现无法预料的结果,可谓防不胜防。例如:

o foo2.c

1 #include <stdlib.h>
2 int main(int argc, char *argv[])
3 {
4         int *p1 = (int *)malloc(sizeof (int));
5         int *p2 = p1;        /* p2 and p1 are pointing to the same memory */
6         free(p1);            /* p1 is       a dangling pointer, so is p2  */
7         p1 = NULL;           /* p1 is not   a dangling pointer any more   */
8         return (*p2 & 0x7f); /* p2 is still a dangling pointer            */
9 }

3. 使用野指针和悬空指针的危害

无论是野指针还是悬空指针,都是指向无效内存区域(这里的无效指的是"不安全不可控")的指针。 访问"不安全可控"(invalid)的内存区域将导致"Undefined Behavior"。

关于"Undefined Behavior", 定义(参考来源:A Guide to Undefined Behavior in C and C++, Part 1)如下:

Anything at all can happen; the Standard imposes no requirements. The program 
may fail to compile, or it may execute incorrectly (either crashing or silently 
generating incorrect results), or it may fortuitously do exactly what the 
programmer intended.

也就是说:任何可能都会发生。要么编译失败,要么执行得不正确(崩溃(e.g. segmentation fault)或者悄无声息地产生不正确的执行结果),或者偶尔会正确地产生程序员希望运行的结果。

4. 如何避免使用野指针和悬空指针?

如何避免使用野指针? 好办! 养成在定义指针后且在使用之前完成初始化的习惯就好

然而,如何避免使用悬空指针,就比较麻烦了。 Solaris引入了ADI(Application Data Integrity)技术避免访问已经释放的内存区域,例如: 最新的SPARC平台已经支持KADI,一旦访问某个已经释放掉的内核内存区域,就会引发操作系统panic。这里简单介绍一下什么是ADI。

ADI (Application Data Integrity) is a software layer built on
MCD (Memory Corruption Detection), a SPARC hardware feature that provides
statistical protection against memory corruption errors such as buffer
overflows, use-after-frees, and use-after-reallocs.

KADI allows kernel memory to use ADI.

这有效的避免了foo2.c示例代码中p2变成悬空指针还被使用的情况。 那么问题来了,如果没有ADI/KADI这种高大上的技术,如何避免使用悬空指针?

办法还是有的,直接避免不了就间接避免,那就是所谓的智能指针(smart pointer)智能指针的本质是使用引用计数(reference counting)来延迟对指针的释放。

o 关于智能指针,请参考维基百科。

Smart pointers eliminate dangling pointers by postponing destruction until 
an object is no longer in use.

o 关于引用计数,也请参考维基百科。

Reference counting is a technique of storing the number of references, 
pointers, or handles to a resource such as an object, block of memory, 
disk space or other resource.

特别说明: 引用计数不但在基于垃圾回收技术的内存管理中被广泛使用,而且在操作系统内核实现中也被广泛使用。 例如: 索引结点(inode)就有使用引用计数,从而保证了硬链接文件的实现。

1 $ stat foo | egrep Inode
2 Device: 23000000002h/2405181685762d     Inode: 210756182   Links: 1
3 
4 $ ln foo foo9 && stat foo  | egrep Inode
5 Device: 23000000002h/2405181685762d     Inode: 210756182   Links: 2
6 
7 $ rm -f foo   && stat foo9 | egrep Inode
8 Device: 23000000002h/2405181685762d     Inode: 210756182   Links: 1

关于智能指针,C++11有很好的支持。 为了方便理解智能指针本质上是延迟释放内存,下面给出一个简单的C代码实现(使用proxy(代理) + refcnt(引用计数))。

o foo3.c

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 
 4 typedef struct proxy_s {
 5     int *object;
 6     int refcnt;
 7 } proxy_t;
 8 
 9 static int *create(proxy_t **proxy)
10 {
11     if (*proxy == NULL) {
12         proxy_t *p = (proxy_t *)malloc(sizeof (proxy_t));
13         p->object = (int *)malloc(sizeof (int));
14         p->refcnt = 1;
15         *proxy = p;
16     } else {
17         ((*proxy)->refcnt)++;
18     }
19 
20     return (*proxy)->object;
21 }
22 
23 static void destroy(proxy_t *proxy)
24 {
25     (proxy->refcnt)--;
26 
27     if (proxy->refcnt == 0) {
28         free(proxy->object);
29         free(proxy);
30     }
31 }
32 
33 static void dump(proxy_t *proxy, int i)
34 {
35     printf("%02d\tobject: %p (0x%02x) refcnt: %d\n", i,
36         proxy->object, *(proxy->object), proxy->refcnt);
37 }
38 
39 int main(int argc, char *argv[])
40 {
41     proxy_t *proxy = NULL;
42 #define NEW()        create(&proxy)
43 #define DELETE(p)    do { destroy(proxy); p = NULL; } while (0)
44 #define DUMP(i)      dump(proxy, i)
45     int *p1 = NEW();
46     int *p2 = NEW();
47     *p1  = 0xab; DUMP(1);
48     DELETE(p1);  DUMP(2);
49     *p2 += 0x21; DUMP(3);
50     DELETE(p2);  DUMP(4);
51     return (0);
52 }

o 编译并执行

$ gcc -Wall -m32 -g -o foo3 foo3.c
$ ./foo3
01    object: 0x8d0d018 (0xab) refcnt: 2
02    object: 0x8d0d018 (0xab) refcnt: 1
03    object: 0x8d0d018 (0xcc) refcnt: 1
04    object: 0x8d0d010 (0x00) refcnt: 0

一句话总结: 明白了野指针和悬空指针的形成原理及其危害,那么在编程的时候就能有的放矢,写出安全可控的优质代码。

附: 关于wild pointer(野指针)和dangling pointer(悬空指针), 这里援引一段来自yahoo answer的解释以帮助更好的理解。

A dangling pointer is a pointer that used to point to a valid address but now 
no longer does. This is usually due to that memory location being freed up and 
no longer available. There is nothing wrong with having a dangling pointer 
unless you try to access the memory location pointed at by that pointer. 
It is always best practice not to have or leave dangling pointers.

A wild pointer is a pointer that has not been correctly initialized and 
therefore points to some random piece of memory. 
It is a serious error to have wild pointers.
posted @ 2017-02-28 10:39  veli  阅读(22008)  评论(2编辑  收藏  举报