/proc/{pid}/maps解读, 虚拟内存

Posted on 2019-12-19 15:23 bw_0927 阅读(8872) 评论(0) 收藏举报

/proc/self/maps

非常常用的系统文件

总共6列，如

76093000-76096000 r-xp 00000000 b3:19 941 /system/lib/libmemalloc.so

所处虚拟内存地址(VMA)范围：``76093000-76096000`
- 在Linux中将进程虚拟空间中的一个段叫做虚拟内存区域VMA（Virtual Memory Area)。
- VMA对应ELF文件中的segment。
- ELF文件有section和segment的概念。

从链接的角度看，ELF是按照section存储的，事实也的确如此；从装载的角度看，ELF文件又按照segment进行划分，这是为了防止按照section装载时造成的内部碎片。

　　　　　　 segment相当于是将多个属性（读写执行）相同的section合并在一起进行。program headers 存放segment的信息;section table存放section的信息.

VMA权限：r-xp

r=读，w=写,x=,s=共享,p=私有
偏移量：00000000

表示VMA对应的segment在映像文件中的偏移。
主设备号和次设备号（大雾）：b3:19
映像文件的节点号inode：941
映像文件的路径：/system/lib/libmemalloc.so

https://blog.csdn.net/lijzheng/article/details/23618365

内核中进程的一段地址空间用一个vm_area_struct结构体表示，所有地址空间存储在task->mm->mmap链表中。

Vm_area_struct每项对应解析如下表所示：

内核每进程的vm_area_struct项

/proc/pid/maps中的项

含义

vm_start

“-”前一列，如00377000

此段虚拟地址空间起始地址

vm_end

“-”后一列，如00390000

此段虚拟地址空间结束地址

vm_flags

第三列，如r-xp

此段虚拟地址空间的属性。每种属性用一个字段表示，r表示可读，w表示可写，x表示可执行，p和s共用一个字段，互斥关系，p表示私有段，s表示共享段，如果没有相应权限，则用’-’代替

vm_pgoff

第四列，如00000000

对有名映射，表示此段虚拟内存起始地址在文件中以页为单位的偏移。对匿名映射，它等于0或者vm_start/PAGE_SIZE

vm_file->f_dentry->d_inode->i_sb->s_dev

第五列，如fd:00

映射文件所属设备号。对匿名映射来说，因为没有文件在磁盘上，所以没有设备号，始终为00:00。对有名映射来说，是映射的文件所在设备的设备号

vm_file->f_dentry->d_inode->i_ino

第六列，如9176473

映射文件所属节点号。对匿名映射来说，因为没有文件在磁盘上，所以没有节点号，始终为0。对有名映射来说，是映射的文件的节点号

第七列，如/lib/ld-2.5.so

对有名来说，是映射的文件名。对匿名映射来说，是此段虚拟内存在进程中的角色。[stack]表示在进程中作为栈使用，[heap]表示堆。其余情况则无显示

下面一起看下一个proc maps的例子。

r-xp 权限是只读，并且可执行，说明是应用程序的代码段

rw-p 权限是可读可写，但是没有执行权限，说明该段是pthread的数据段

堆[heap]段。

08c64000-08c85000 rw-p 08c64000 00:00 0 [heap]

有些maps文件并不会出现该记录，这主要跟程序中有无使用malloc相关，如果主线程使用了malloc就会有该记录，否则就没有。在子线程中调用malloc，会产生另外的堆映射，但是并不会标记[heap]。

栈段[stack]

栈(stack)，作为进程的临时数据区，由kernel把匿名内存map到虚存空间，栈空间的增长方向是从高地址到低地址。

bfd50000-bfd65000 rw-p bffea000 00:00 0 [stack]

对于单线程应用程序而言，只有一个[stack]段，对应多线程应用程序，[stack]段是主线程的栈空间，子线程的栈空间则用pthread库自动分配。

例1，将一个单线程的应用的局部变量的地址打印出来，执行的结果如下所示：

./pthread2

tid addr 0xbfc73600

对应的maps文件：

08048000-08049000 r-xp 00000000 fd:00 3145811 /home/lijz/code/pthread2

08049000-0804a000 rw-p 00000000 fd:00 3145811 /home/lijz/code/pthread2

b7f7e000-b7f80000 rw-p b7f7e000 00:00 0

b7f8a000-b7f8b000 rw-p b7f8a000 00:00 0

bfc5f000-bfc74000 rw-p bffea000 00:00 0 [stack]

局部变量的地址0xbfc73600在[stack]区间。

例2：将一个拥有一个子线程的应用局部变量打印出来，执行的结果如下所示：

tid addr 0xbfd64740---------主线程中打印的局部变量地址

child thread run

stackaddr 0xb7fc93c4--------子线程中打印的局部变量地址

guardsize 4096---------栈保护页大小

对应的maps文件如下：

08048000-08049000 r-xp 00000000 fd:00 3145811 /home/lijz/code/pthread2

08049000-0804a000 rw-p 00000000 fd:00 3145811 /home/lijz/code/pthread2

08c64000-08c85000 rw-p 08c64000 00:00 0 [heap]

b75c9000-b75ca000 ---p b75c9000 00:00 0---------pthread_create默认的栈溢出保护区

b75ca000-b7fcc000 rw-p b75ca000 00:000------------pthread_create创建的子线程的栈空间

b7fd6000-b7fd7000 rw-p b7fd6000 00:00 0------------------4KB应该也是通过mmap产生的匿名映射

bfd50000-bfd65000 rw-p bffea000 00:00 0 [stack]---------主进程的栈空间

由上执行结果显示，主线程中局部变量地址0xbfd64740落在[stack]区间，

而子线程局部变量地址0xb7fc93c4则落在b75ca000-b7fcc000 rw-p b75ca00区间，并且局部变量的地址从高地址开始分配，说明该VMA正是子线程的栈地址空间。

另外，对栈空间，pthread默认设置了一个4KB的栈保护页，对应的区间为：b75c9000-b75ca000---p b75c9000，该区间不可读，不可写，也不能执行，通过这些属性信息的设置，可以达到栈溢出保护的作用。

例3：在例2的基础上，多创建一个线程，pthread2程序的执行结果如下所示：

./pthread2

tid addr 0xbfc81610----------主线程局部变量地址

child thread run

stackaddr = 0xb7f183c0-------子线程1局部变量地址

guardsize 4096

child thread2 run

stackaddr =0xb75173c4 ----------子线程局部变量地址

guardsize 4096

对应的maps文件：

08048000-08049000 r-xp 00000000 fd:00 3145811 /home/lijz/code/pthread2

08049000-0804a000 rw-p 00000000 fd:00 3145811 /home/lijz/code/pthread2

092d6000-092f7000 rw-p 092d6000 00:00 0 [heap]

76b16000-b6b17000 rw-p 76b16000 00:00 0 ----------mallocmmap

b6b17000-b6b18000 ---p b6b17000 00:00 0

b6b18000-b7518000 rw-p b6b18000 00:000---------pthread thread2 stack space

b7518000-b7519000 ---p b7518000 00:00 0

b7519000-b7f1b000 rw-p b7519000 00:000----------pthread thread1 stack space

b7f25000-b7f26000 rw-p b7f25000 00:00 0

bfc6e000-bfc83000 rw-p bffea000 00:00 0 [stack]---main thread stack space

从maps文件记录上看，增加一个子线程，在maps文件中就增加了两条记录，分别是子线程的栈空间和栈保护页的记录。

默认情况下，pthread为子线程预留的栈空间大小为1MB，栈保护页为4KB（这主要跟页大小相关）。

总之，proc maps文件可以查看进程的内存映射，每一段内存的权限属性等信息。
————————————————

http://blog.coderhuo.tech/2017/10/12/Virtual_Memory_C_strings_proc/

三、虚拟内存

在计算机领域，虚拟内存是通过软硬件结合实现的一种内存管理技术，它将程序所使用的内存地址（虚拟内存地址）映射到计算机的物理内存上（物理内存地址），这使得每个程序看到的内存地址空间都是连续的（或是一些连续地址空间的集合）。

操作系统管理虚拟地址空间，以及虚拟地址空间到物理内存的映射。CPU中的地址转换硬件(通常被称为内存管理单元, MMU)自动将虚拟内存地址转换成物理内存地址。操作系统可以提供比实际物理内存更多的虚拟内存，这一行为是通过操作系统中的软件来实现的。

虚拟内存的主要好处包含以下几点:

将应用程序从内存管理中解放出来, 应用程序只需关心自己的逻辑
不同应用程序间的虚拟内存是相互隔离的, 所以安全性增加了
结合内存分页管理技术, 应用程序理论上可使用比物理内存更多的内存空间

有关虚拟内存的知识, 可进一步阅读维基百科上的相关介绍：虚拟内存。

在虚拟内存探究 – 第三篇:一步一步画虚拟内存图中，我们将探索虚拟内存的更多细节，并且看下虚拟内存中都有些什么，以及这些东西分别位于虚拟内存的什么地方。

继续阅读本文前, 你需要知道以下几点:

每个进程都有自己独立的虚拟内存
虚拟内存大小依赖于计算机系统架构
不同的操作系统对虚拟内存的处理会有所不同, 对于现代的大多数操作系统来说, 虚拟内存如下所示:

在虚拟内存的高地址空间，我们可以看到(下面仅列出了部分内容，并非全部)：

命令行参数和环境变量
“向下”生长的栈。咋看之下这是违反直觉的，但这确实是虚拟内存中栈的实现方式。

在虚拟内存的低地址空间, 我们可以看到：

可执行程序(实际上远比这复杂，但对于理解本文剩余内容足够了)
“向上”生长的堆

堆是虚拟内存的一部分，动态分配的内存(比如用malloc分配的内存)位于堆中。

请时刻记住, 虚拟内存和物理内存是不同的。

进程的虚拟内存空间多大

进程虚拟地址空间的大小依赖于计算机系统架构。我运行本例使用的是64位机器，所以理论上每个进程的虚拟内存是2^64字节，内存最高地址是0xffffffffffffffff (1.8446744e+19)，最低地址是0x0。

/proc/[pid]/mem
              This file can be used to access the pages of a process's memory
          through open(2), read(2), and lseek(2).

proc/[pid]/maps
              A  file containing the currently mapped memory regions and their access permissions.

五、替换进程的字符串

我们接下来要在一个进程的堆中搜索特定字符串，并用另一个字符串（长度不大于原字符串）替换它。现在我们已经掌握了所需要的理论知识。

下面这个程序是我们将要hack的程序，正常情况下它循环输出字符串Holberton。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

/**              
 * main - uses strdup to create a new string, loops forever-ever
 *                
 * Return: EXIT_FAILURE if malloc failed. Other never returns
 */
int main(void)
{
     char *s;
     unsigned long int i;

     s = strdup("Holberton");
     if (s == NULL)
     {
          fprintf(stderr, "Can't allocate mem with malloc\n");
          return (EXIT_FAILURE);
     }
     i = 0;
     while (s)
     {
          printf("[%lu] %s (%p)\n", i, s, (void *)s);
          sleep(1);
          i++;
     }
     return (EXIT_SUCCESS);
}

编译运行该程序，它将循环输出字符串Holberton直到进程被杀死。

julien@holberton:~/holberton/w/hackthevm0$ gcc -Wall -Wextra -pedantic -Werror loop.c -o loop
julien@holberton:~/holberton/w/hackthevm0$ ./loop 
[0] Holberton (0xfbd010)
[1] Holberton (0xfbd010)
[2] Holberton (0xfbd010)
[3] Holberton (0xfbd010)
[4] Holberton (0xfbd010)
[5] Holberton (0xfbd010)
[6] Holberton (0xfbd010)
[7] Holberton (0xfbd010)
...

感兴趣的话，你可以暂停阅读本文，尝试写个脚本/程序寻找进程堆中的字符串。

/proc/pid/maps

如之前所见，文件/proc/pid/maps是个文本文件，我们可以直接读取，内容如下：

julien@ubuntu:/proc/4618$ cat maps
00400000-00401000 r-xp 00000000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
00600000-00601000 r--p 00000000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
00601000-00602000 rw-p 00001000 08:01 1070052                            /home/julien/holberton/w/funwthevm/loop
010ff000-01120000 rw-p 00000000 00:00 0                                  [heap]
7f144c052000-7f144c20c000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c20c000-7f144c40c000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c40c000-7f144c410000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c410000-7f144c412000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f144c412000-7f144c417000 rw-p 00000000 00:00 0 
7f144c417000-7f144c43a000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c61e000-7f144c621000 rw-p 00000000 00:00 0 
7f144c636000-7f144c639000 rw-p 00000000 00:00 0 
7f144c639000-7f144c63a000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63a000-7f144c63b000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f144c63b000-7f144c63c000 rw-p 00000000 00:00 0 
7ffc94272000-7ffc94293000 rw-p 00000000 00:00 0                          [stack]
7ffc9435e000-7ffc94360000 r--p 00000000 00:00 0                          [vvar]
7ffc94360000-7ffc94362000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

回想前面的内容，可以看到，栈（[stack])位于内存的高地址，堆（[heap])位于内存的低地址。

[heap]

从maps文件中，我们可以找到搜索字符串需要的所有信息:

010ff000-01120000 rw-p 00000000 00:00 0                                  [heap]

这个进程的堆信息如下:

在虚拟内存中的起始地址是0x010ff000
结束地址是01120000
权限是可读写的（rw）

回顾下正在运行的loop的输出:

...
[1024] Holberton (0x10ff010)
...

0x010ff000 < 0x10ff010 < 0x01120000。这证明了我们的字符串是在堆上。更精确的说，新字符串是在堆偏移0x10的地方。

如果我们打开文件/proc/4618/mem并且将文件指针移动到0x10ff010，我们就能替换正在运行的程序loop中的字符串Holberton。

我们接下来会写个程序/脚本做这件事情。
你也可以暂停阅读本文，用自己最熟悉的语言尝试写个脚本/程序来做这件事情。

替换虚拟内存中的字符串###

下面是我们用Python3 实现字符串替换的脚本(read_write_heap.py):

#!/usr/bin/env python3
'''             
Locates and replaces the first occurrence of a string in the heap
of a process    

Usage: ./read_write_heap.py PID search_string replace_by_string
Where:           
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
  search_string with
'''

import sys

def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    sys.exit(1)

# check usage  
if len(sys.argv) != 4:
    print_usage_and_exit()

# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()

# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))

# try opening the maps file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    sys.exit(1)

for line in maps_file:
    sline = line.split(' ')
    # check if we found the heap
    if sline[-1][:-1] != "[heap]":
        continue
    print("[*] Found [heap]:")

    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]
    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))

    # check if there is read and write permission
    if perm[0] != 'r' or perm[1] != 'w':
        print("[*] {} does not have read/write permission".format(pathname))
        maps_file.close()
        exit(0)

    # get start and end of the heap in the virtual memory
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone, not even your OS :)
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))

    # open and read mem
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()
        exit(1)

    # read heap  
    mem_file.seek(addr_start)
    heap = mem_file.read(addr_end - addr_start)

    # find string
    try:
        i = heap.index(bytes(search_string, "ASCII"))
    except Exception:
        print("Can't find '{}'".format(search_string))
        maps_file.close()
        mem_file.close()
        exit(0)
    print("[*] Found '{}' at {:x}".format(search_string, i))

    # write the new string
    print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
    mem_file.seek(addr_start + i)
    mem_file.write(bytes(write_string, "ASCII"))

    # close files
    maps_file.close()
    mem_file.close()

    # there is only one heap in our example
    break

注意：需要以root权限执行上面的脚本，否则无法读写文件/proc/pid/mem，即使你是进程的所有者。

运行上面的脚本：

julien@holberton:~/holberton/w/hackthevm0$ sudo ./read_write_heap.py 4618 Holberton "Fun w vm!"
[*] maps: /proc/4618/maps
[*] mem: /proc/4618/mem
[*] Found [heap]:
    pathname = [heap]
    addresses = 010ff000-01120000
    permisions = rw-p
    offset = 00000000
    inode = 0
    Addr start [10ff000] | end [1120000]
[*] Found 'Holberton' at 10
[*] Writing 'Fun w vm!' at 10ff010
julien@holberton:~/holberton/w/hackthevm0$ 

可以看到上面脚本打印出来的地址和我们手动找到的是一致的:

进程的堆位于虚拟内存的0x010ff000 ~ 0x01120000
我们要找的字符串地址是0x10ff010, 相对于堆的起始地址偏移了0x10

回过头来看下我们的loop程序，它应该会打印字符串”fun w vm!”

...
[2676] Holberton (0x10ff010)
[2677] Holberton (0x10ff010)
[2678] Holberton (0x10ff010)
[2679] Holberton (0x10ff010)
[2680] Holberton (0x10ff010)
[2681] Holberton (0x10ff010)
[2682] Fun w vm! (0x10ff010)
[2683] Fun w vm! (0x10ff010)
[2684] Fun w vm! (0x10ff010)
[2685] Fun w vm! (0x10ff010)

六、下节预告

下一篇文章中我们要做的事情和本章类似，不同的是我们将访问并修改一个Python3 脚本的内存。这做起来比较吃力，所以我们需要了解Pyhton3 内部的一些机制。不信你可以试试，上面的脚本read_write_heap.py并不能修改Python3进程中的ASCII字符串。

七、继续阅读

第一篇:虚拟内存探究 – 第一篇:C strings & /proc
第二篇:虚拟内存探究 – 第二篇:Python 字节
第三篇:虚拟内存探究 – 第三篇:一步一步画虚拟内存图
第四篇:虚拟内存探究 – 第四篇:malloc, heap & the program break
第五篇:虚拟内存探究 – 第五篇:The Stack, registers and assembly code

八、原文链接

Hack The Virtual Memory: C strings & /proc

Never too late

公告