[书摘] 代码阅读的方法与实践（二）

3 高级C数据类型

人们在设计完全傻瓜式的系统时，常犯的错误就是低估了十足傻瓜的能力——Douglas Adams

A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools. - Douglas Adams

3.1 指针

指针的用途：

构造链式数据结构
引用动态分配的数据结构
实现引用调用（call by reference）
访问和迭代数据元素
传递数组参数
引用函数
作为其他值得别名
代表字符串
直接访问系统内存

3.1.1 链式数据结构

在最底层，指针就是一个内存地址。特别适合于表示数据结构中各个元素之间的链接。

3.1.2 数据结构的动态分配

向量类型（vector-type）的数据结构经常动态分配，以使其大小符合运行时对元素数量的需求。指针用来存储向量类型数据结构的起始地址。

指针用来存储分配给 struct 的内存单元

static struct diskentry * finddisk(const char *name)
{
     struct diskentry * d;
     // ...
     d = emalloc(sizeof(*d));
     d->d_name = estrdup(name);
     d->d_name[len] = '\0';
     // ...
     return d;    
}

用宏来完成，宏的名称为 new

#define new(type) (type*) calloc(sizeof(type), 1)
// ...
node = new(struct codeword_entry);

3.1.3 引用调用

以引用传递的参数可以用来返回函数的结果（函数的返回值只用来表明出错的情况），或者避免参数复制带来的开销。

取值运算符（address-of operator, &）

数组总是以引用传递；大部分架构中，对于其他的基本 C 类型，在函数调用时直接复制要比以引用传递更为高效。

现代 C 和 C++ 程序中，用 const 声明符莱修饰，说明该参数不是用于返回结果，以引用传递是为了避免参数复制的开销，从而提高效率。

3.1.4 数据元素的访问

用纸箱数组元素地址的指针，可以访问位于特定索引位置的元素；对于指向数组元素的指针和相应的数组索引，而这的算法具有相同的语义。

使用指针访问基于数组的栈

stackp = de_stack;             // Initialize stack
// ...
*stackp++ = finchar;          // Push finchar into stack
// ...
do{
     if(count-- == 0)
          return (num);
     *bp++ = *--stackp;             // Pop from stack into *bp
}while(stackp > de_stack);     // Check if stack is empty

3.1.5 数组型的参数和结果

在 C 和 C++ 程序中，将数组传递到函数和作为结果返回，都要用到指针。在 C 代码中，将数组名作为函数的参数时，实际上传递给函数的是数组的第一个元素的地址。函数执行时，对数组uzhong的数据做出的任何修改，都将影响到数组中的元素。（所有其他类型，如 char, int, float, 甚至结构和共用体，都是值传递）

C 函数只能返回只想数组元素的指针，不能返回整个数组。

将这类数组声明为 static，确保该数组不是在函数的栈上分配的局部变量

将 Internet 地址转换成圆点分隔的十进制数表示

char * inet_ntoa(struct in_addr ad)
{
     unsigned long int s_ad;
     int a, b, c, d;
     static char addr[20];
     s_ad = ad.s_addr;
     d = s_ad % 256;
     s_ad /= 256;
     c = s_ad % 256;
     s_ad /= 256;
     b = s_ad % 256;
     s_ad /= 256;
     a = s_ad % 256;
     sprintf(addr, "%d.%d.%d.%d", a, b, c, d);
     return addr;
}

使用全局或 static 局部变量的函数大多数情况下都不可重入（reentrant）。

函数 naddr_ntoa 用作 inet_ntoa 的包装函数，将 inet_ntoa 的结果存储到 4 个不同临时缓冲区组成的循环列表中。

char * naddr_ntoa(naddr a)
{
#define NUM_BUFS 4
     static int bufno;
     static struct
     {
          char str[16]; /* xxx.xxx.xxx.xxx\0 */
     }bufs[NUM_BUFS];
     char * s;
     struct in_addr addr;
     addr.s_addr = a;
     s = strcpy(bufs[bufno].str, inet_ntoa(addr));
     bufno = (bufno + 1)%NUM_BUFS;
     return s;
}

3.1.6 函数指针

函数参数化，即把函数作为参数传递给另外的函数。

C 语言不允许将函数作为参数传递给其他函数；然而，它允许传递指向函数的指针。

函数指针可以用来参数化代码体内的控制。

3.1.7 用作别名的指针

任何原来使用变量 output 的地方，都可以使用 *outl

struct output output = {NULL, 0, NULL, OUTBUFSIZ, 1, 0};
// ...
struct output *outl = &output;

1 效率上的考虑

指针的赋值要比对较大对象的复制更有效率

static struct termios cbreakt, rawt, *curt;
curt = useraw ? &rawt : &cbreakt;

2 引用静态初始化的数据

用一个变量指向不同的静态数据。（最常见的情况是指向不同字符串的字符指针）

char *s;
s = *(opt->bval) ? "True" : "False";

3 在全局语境中实现变量引用语义

在其他地方可以使用全局的指针变量引用要访问和修改的数据；可以用来修改全局数据。

3.1.8 指针和字符串

在 C 语言中，字符串常量用以字符 '\0' 结尾的字符数组来表示。因而，字符串由指向 null 结尾序列中第一个字符的指针来表示。

size_t strlen(const char *str)
{
     register const char *s;
     for( s=str; *s; ++s )
          ;
     return (s-str);
}

注意区分字符指针和字符数组

// 字符指针
// pw_file 变量的大小为 4，可以将它改为指向其他的地方，但更改它指向的内存会产生不可预知的行为
static char * pw_file = "/etc/passwd";
// 字符数组，对 line 应用 sizeof 运算符返回 11；line 总是指向同一存储区域，可以自由的修改所包含的元素
static char line[] = "/dev/XtyXX";
line[5] = 'p';

3.1.9 直接访问内存

低级代码使用指针访问与硬件专有的内存区域。

将一个指针变量初始化为指向该区域后，就可以方便的使用这个变量与给定的设备进行通信。

3.2 结构

结构的用途：

将一般作为一个整体来使用的数据元素集合到一起
从函数中返回多个数据元素
构造链式数据结构
映射数据在硬件设备、网络链接和存储介质上的组织方式
实现抽象数据类型
一面向对象方式编程

3.2.1 聚合数据元素

将一般作为整体使用的互相关联的元素聚合在一起。

struct point{
     int col, line;
};

3.2.2 从函数中返回多个数据元素

返回多个数据的方式：通过以引用的方式传递给函数的参数；聚合成一个结构，然后返回

3.2.3 映射数据的组织形式

当数据在网络上移动或传入/传出辅助存储介质时或当程序直接与硬件进行交互时，结构经常用来表示数据在其他介质上的组织方式。

Intel EtherExpress 网卡上的一个命令块：

struct fxp_cb_nop{
     void * fill[2];
     volatile u_int16_t cb_status;
     volatile u_int16_t cb_commands;
     volatile u_int32_t link_addr;
};

volatile 限定符（qualifier）用来标明，底层的内存字段要被程序之外的实体使用；从而，禁止编译器对这些字段执行优化，比如移除冗余引用。

声明位字段（bit field），指定一段精确的位范围，保存给定设备上的特定值。

struct fxp_cb_config
{
     // ...
     // 传输字节数（byte_count）占据 6 个二进制位
     volatile u_int8_t byte_count:6, :2;
     // 接收者和传送者的 FIFO 队列限制分别在硬件设备上占据 4 个二进制位和 3 个二进制位
     volatile u_int8_t rx_fifo_limit:4, tx_fifo_limit:4, :1;
};

网络数据包在编码时，使用 C 结构来描绘组成元素

// TCP 包头的经典定义
struct tcphdr
{
     u_int16_t th_sport;     // source port
     u_int16_t th_dport;     // destination port
     tcp_seq th_seq;          // sequence number
     tcp_seq th_ack;          // acknowledgment number
     // ...     
};

结构用于映射数据在外设介质上（比如磁盘和磁带）的存储方式。

// MS-DOS 磁盘分区的 BIOS 参数块（BIOS parameter block）
struct bpb33
{
     u_int16_t bpbBytesPerSec;     // bytes per sector
     u_int8_t bpbSecPerClust;       // sectors per cluster
     u_int16_t bpbResSectors;      // number of reserved sectors
     u_int8_t bpbFATs;                // number of FATs
     u_int16_t bpbRootDirEnts;     // number of root directory entries
     u_int16_t bpbSectors;           // total number of sectors
     u_int8_t bpbMedia;               // media descriptor
     u_int16_t bpbFATsecs;          // number of sectors per FAT
     u_int16_t bpbSecPerTrack;    // sectors per track
     u_int16_t bpbHeads;             // number of heads
     u_int16_t bpbHiddenSecs;     // number of hidden sectors
};

结构内字段的次序依赖于所处的架构和所用的编译器。结构中各种元素的表达也依赖于架构和操作系统。用结构来映射外部数据天生不可移植。

3.2.4 以面向对象的方式编程

C 程序中，经常将数据元素和函数指针聚合成结构，模拟类的字段和方法，创建与类相仿的实体。

3.3 共用体

C 共用体（Union）将共享同一存储区域的项聚合起来。某一时刻，共享该区域的这些项中，只有一项可以访问。

用途：

有效地利用存储空间
实现多态（polymorphism）
使用不同的内部表达方式对数据进行访问

3.3.1 有效利用存储空间
3.3.2 实现多态

同一对象（一般用 C 结构来表示）用于代表不同的类型。这些不同类型的数据都存储在独立的共用体成员中。存储在共用体中的多态数据（polymorphism data）比其他内存管理方式节省空间；但是，这种情况下，使用共用体是为了让一个对象存储不同类型的数据，而非显示我们的节约。

3.3.3 访问不同的内部表达

将数据存入共用体的一个字段，然后访问另外的字段，已完成数据在不同内部表达方式之间的转换。这种应用天生不能移植。

C 语言中整形（包括字符）所能表达的任何位模式都是合法有效的，所以将其他 C 类型的内部表达（指针、浮点数、其他整数类型）作为整型来访问一定是合法的。但是，相反的操作（从整型表达生成其他类型）不是总能得出正确的值。

访问以某种其他格式存储的架构专有的数据元素，可以基于数据类型的表达方式进行解析，或者从它的表达方式创建一个数据类型。

3.4 动态内存分配

程序编写时大小未知的数据结构，或程序运行时大小会增长的结构，存储在程序运行期间动态分配的内存中。程序使用指针来引用动态分配的内存。

int update_msg( uchar *msg, int *mslen, int Vlist[], int c )
{
    // ...
    // pointer to an integer
    int *RRlen;                                                       
    // ...
    // c - Number of elements
    // sizeof(int) - Size of each element
    RRlen = (int *)malloc((unsigned)c*sizeof(int)); 
    // Handle memory exhaustion
    if ( !RRlen )
        panic( errno, "malloc(RRlen)");
    // ...
    // Iterate over all elements                 
    for ( i=0; i<c; i++) 
         {
        // ...
        // Use RRlen as an array
        RRlen[i] = dn_skipname( cp, msg + *msglen ); 
        // ...
    }
    // ...
    // Free allocated memory
    free((char *)RRlen); 
    return (n);
}

对数组变量显式的应用 sizeof 将返回数组的大小，而相同的运算符应用到一个指针时，将仅仅返回内存中存储指针所需的存储空间。

调用 realloc，将第一个参数所指的空间，调整为由函数的第二个参数所指定的新大小。函数返回一个指向调整后内存块的指针，地址可能和最初块的地址并不相同。原内存块中的内容被复制到新的位置。任何指向原来内存块中某个位置的指针变量，现在指向的都是未定义的数据。

void remember_rup_data( char *host, struct statstime * st)
{
    // Index larger than allocated size?
    if( rup_data_idx >= rup_data_max)                                                            
    {
        // New size
        rup_data_max += 16;
        // Adjust allocation
                  rup_data = realloc(rup_data, rup_data_max * sizeof(struct rup_data));     
                  if( rup_data == NULL )
                  {
                           err(1, "realloc");
                  }
         }
    // Store data
    rup_data[rup_data_idx].host = strup(host);
    // New index
    rup_data[rup_data_idx].statstime = *st;
    rup_data_idx++;
}

3.4.1 管理空闲内存

所有的指针必须初始化为由 malloc 分配的内存块 —— 初学 C/C++ 的程序员对指针的常见误解

垃圾回收器（garbage collector）和保守垃圾回收器（conservative garbage collector）

垃圾回收器，自动回收不再使用的存储空间。每个内存块都拥有相关联的引用计数（reference count）。每次创建对该内存块的新引用时，都递增引用计数；每当一个引用销毁时，递减引用计数；引用计数达到 0 时，表示该内存块不再被使用，可以释放。
保守垃圾回收器，扫描进程的所有内存，寻找与现分配内存块相匹配的地址。将扫描中没有遇到的所有内存块都释放掉。

alloca 函数，使用与 malloc 相同的接口分配一个内存块，但却不是在程序的堆（heap）上（程序的通用内存区域）分配块，而是在程序的栈上（用来存储函数返回地址和局部变量的内存区域）分配。由 alloca 返回的内存块，在分配它的函数返回时会被自动回收；不需要调用 free 来处理分配的块。

3.4.2 含有动态分配数组的结构

typedef struct
{
    char * user;
    char * group;
    char * flags;
    char data[1];
} NAMES;
if( ( np = malloc(sizeof(NAMES) + ulen + glen + flen + 3)) == NULL )
{
    error(1, "%s", "");
}
np->user = &np->data[0];
(void)strcpy(np->user, user);

作为结构元素的 data 数组，用作实际数据的占位符。分配了用来存储结构的内存之后，它的大小会根据 data 数组中的元素的个数向上调整。此后，就可以使用数组中的元素，如同数组包含这些元素的空间一样。

以上示例分配的内存要比实际所需多一个字节。内存块的大小是结构的大小和相关数据元素的大小—— 3 个字符串的大小（ulen, glen, flen）和它们对应的 null 结束符（3）之和。然而，在计算内存块的大小时，没有考虑结构块的大小中已经包括了一个站位字节。

内存存储结构的微管理是一个危险活动，时刻与错误相伴。

3.5 typedef 声明

typedef 声明为一个已有类型增加一个新的名称，或替代名

C 程序使用 typedef 声明促进抽象，并增强代码的易读性，从而防范可移植问题，并模拟 C++ 和 Java 的类声明行为。

可以将 typedef 看作是类似 extern 或 static 的存储类型描述符，将该声明理解为一个变量定义。

用一系列依赖于具体实现的声明，为已知的硬件量穿件可移植的名称

typedef __signed char int8_t;
typedef unsigned char u_int8_t;
typedef short int16_t;
typedef unsighed short u_int16_t;
typedef int int32_t;
typedef unsighed int     u_int32_t;
typedef long int64_t;
typedef unsighed long u_int64_t;

用前面声明的名称之一，为硬件表达方式已知的数量创建抽象名称

typedef u_int32_t in_addr_t;
typedef u_int16_t in_port_t;

typedef 还用来模拟 C++ 和 Java 种类型声明引入新类型的行为

typedef struct path path;
struct path
{
    // ...
}

Maxims

By recognizing the function served by a particular language construct, you can better understand the code that use it.
Recognize and classify the reason behind each use of a pointer.
Pointers are used in C programs to construct linked data structures, to dynamically allocate data structures, to implement call by reference, to access and iterate through data elements, when passing arrays as arguments, for referring to functions, as an alias for another value, to represent character strings, and for direct access to system memory.
Function arguments passed by reference are used for returning function results or for avoiding the overhead of copying the argument.
A pointer to an array element address can be used to access the element at the specific position index.
Arithmetic on array element pointers has the same semantics as arithmetic on the respective array indices.
Functions using global or static local variable are in most cases not reentrant.
Character pointers differ from character arrays.
Recognize and classify the reason behind each use of a structure or union.
Structures are used in C programs to group together data elements typically used as a whole, to return multiple data elements from a function, to construct linked data structures, to map the organization of data on hardware devices, network links, and storage media, to implement abstract data types, and to program in an object-oriented fashion.
Union are used in C programs to optimize the use of storage, to implement polymorphism, and for accessing different internal representations of data.
A pointer initialized to point to storage for N elements can be dereferenced as if it were an array of N elements.
Dynamically allocated memory blocks are freed explicitly or when the program terminates or through use of a garbage collector; memory blocks allocated on the stack are freed when the function they were allocated in exists.
C program use typedef declarations to promote abstraction and enhance the code's readability, to guard against portability problems, and to emulate the class declaration behavior of C++ and Java.
You can read typedef declarations as if they were variable definitions: the name of the variable being defined is the type's name; the variable's type is the type corresponding to that name.

posted on 2012-12-28 08:55 zhaorui 阅读(345) 评论(0) 收藏举报

刷新页面返回顶部

Road to Freelancer

[书摘] 代码阅读的方法与实践（二）

公告

导航