段错误

  前些日子深信服面试,面试官问到了如何调试段错误,一时还真不知道如何回答。虽然偶尔会遇到段错误,但都是程序运行提示段错误后回去修改代码,而没有深入去了解。

段错误是什么?

  参考维基百科,段错误的一个比较完整的定义如下:

In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. In short, a segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (e.g., attempts to write to a read-only location, or to overwrite part of the operating system). Systems based on processors like the Motorola 68000 tend to refer to these events as Address or Bus errors.

On Unix-like operating systems, a process that accesses invalid memory receives the SIGSEGV signal. On Microsoft Windows, a process that accesses invalid memory receives the STATUS_ACCESS_VIOLATION exception.

  另外,维基百科还总结了一些引起段错误的典型原因:

The following are some typical causes of a segmentation fault:
  1. Dereferencing null pointers – this is special-cased by memory management hardware
  2. Attempting to access a nonexistent memory address (outside process's address space)
  3. Attempting to access memory the program does not have rights to (such as kernel structures in process context)
  4. Attempting to write read-only memory (such as code segment)

These in turn are often caused by programming errors that result in invalid memory access:   1. Dereferencing or assigning to an uninitialized pointer (wild pointer, which points to a random memory address)   2. Dereferencing or assigning to a freed pointer (dangling pointer, which points to memory that has been freed/deallocated/deleted)   3. A buffer overflow   4. A stack overflow   5. Attempting to execute a program that does not compile correctly. (Some compilers will output an executable file despite the presence of compile-time errors.)

如何调试段错误?

  该部分主要参考自博文你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(3)

问题代码

  作为例子的代码如下:

 1 // stack.c
 2 #include "stdio.h"
 3 #include "string.h"
 4 #include "stdlib.h"
 5 
 6 
 7 int main(int argc,char** args) {
 8     char * p = NULL;
 9     *p = 0x0;
10 }

  程序运行结果如下:

  这里写图片描述

找出问题

第1步 strace 查信号描述

strace -i -x -o segfault.txt ./segfault.o

  得到如下信息: 
  这里写图片描述

  可以知道:

1.错误信号:SIGSEGV 
3.错误码:SEGV_MAPERR 
3.错误内存地址:0x0 
4.逻辑地址0x400507处出错.

  可以猜测:

程序中有空指针访问试图向0x0写入而引发段错误.

  关于strace使用可参考博文 Linux strace 命令

第2步 dmesg 查错误现场

dmesg

  得到: 
  这里写图片描述

  可知:

1.错误类型:segfault ,即段错误(Segmentation Fault). 
2.出错时ip:0x400507 
3.错误号:6,即110

第3步 收集已知结论

  这里 错误号和ip 是关键,错误号对照下面:

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found   1: protection fault
     *   bit 1 ==    0: read access     1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==               1: use of reserved bit detected
     *   bit 4 ==               1: fault was an instruction fetch
     */
    /*enum x86_pf_error_code {

        PF_PROT     =       1 << 0,
        PF_WRITE    =       1 << 1,
        PF_USER     =       1 << 2,
        PF_RSVD     =       1 << 3,
        PF_INSTR    =       1 << 4,
    };*/

  对照后可知:

错误号6 = 110 = (PF_USER | PF_WIRTE | 0). 
即“用户态”、“写入型页错误 ”、“没有与指定的地址相对应的页”.

  上面的信息与我们最初的推断吻合.

  现在,对目前已知结论进行概括如下:

1.错误类型:segfualt ,即段错误(Segmentation Fault).

2.出错时ip:0x400507

3.错误号:6,即110

4.错误码:SEGV_MAPERR 即地址没有映射到对象.

5.错误原因:对0x0进行写操作引发了段错误,原因是0x0没有与之对应的页或者叫映射.

第4步 根据结论找到出错代码

gdb ./segfault.o

  根据结论中的ip = 0x400507立即得到:

  这里写图片描述

  显然,这验证了我们的结论:

我们试图将值0x0写入地址0x0从而引发写入未映射的地址的段错误.

  这里写图片描述

  并且我们找到了错误的代码stack.c的第9行。

调试 Core Dump

  除了以上提到的方法,我们还可以通过调试 Core Dump 来确定错误代码:

  

 

  关于 Core Dump 的详细,可参考博文 Linux Core Dump

参考资料

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(1)

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(2)

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(3)

  Linux环境下段错误的产生原因及调试方法小结

 

posted @ 2015-10-05 11:15  峰子_仰望阳光  阅读(1466)  评论(0编辑  收藏  举报