Wu.Country@侠缘

勤学似春起之苗,不见其增,日有所长; 辍学如磨刀之石,不见其损,日所有亏!

导航

简单翻译:Understanding Linux Network Internals 2.2. net_device Structure

翻译:Understanding Linux Network Internals
目录:http://www.cnblogs.com/WuCountry/archive/2008/11/15/1333960.html

简单翻译

2.2. net_device Structure
The net_device data structure stores all information specifically regarding a network device. There is one such structure for each device, both real ones (such as Ethernet NICs) and virtual ones (such as bonding[] or VLAN[]). In this section, I will use the words interface and device interchangeably, even though the difference between them is important in other contexts.
net_device数据结构保存所有与网络设备特别关注的信息。每个网络设备都有这样的一个数据结构,包括真实的网络设备以及虚拟的网络设备。

Bonding和VLAN的简单说明,可以在网上查询相关的信息:
[] Bonding, also called EtherChannel (Cisco terminology) and trunking (Sun terminology), allows a set of interfaces to be grouped together and be treated as a single interface. This feature is useful when a system needs to support point-to-point connections at a high bandwidth. A nearly linear speedup can be achieved, with the virtual interface having a throughput nearly equal to the sum of the throughputs of the individual interfaces.

[] VLAN stands for Virtual LAN. The use of VLANs is a convenient way to isolate traffic using the same L2 switch in different broadcast domains by means of an additional tag, called the VLAN tag, that is added to the Ethernet frames. You can find an introduction to VLANs and their use with Linux at http://www.linuxjournal.com/article/7268.

The net_device structures for all devices are put into a global list to which the global variable dev_base points. The data structure is defined in include/linux/netdevice.h. The registration of network devices is described in Chapter 8. In that chapter, you can find details on how and when most of the net_device fields are initialized.
dev_base指向一个全局变量,该变量中存放了所有的网络设备结构net_device。定义的源文件:include/linux/netdevice.h
在第8章中会介绍网络设备的初始化与注册等相关信息。

Like sk_buff, this structure is quite big and includes many feature-specific parameters, along with parameters from many different layers. For this reason, the overall organization of the structure will probably see some changes soon for optimization reasons.
和sk_buff一样,这个结构也是一个非常大的数据结构,包括了很多特性相关的参数。这些参数会在不同的网络层上使用。出于优化的原因,这一数据结构的组织结构可能会发生变化。

Network devices can be classified into types such as Ethernet cards and Token Ring cards. While certain fields of the net_device structure are set to the same value for all devices of the same type, some fields must be set differently by each model of device. Thus, for almost every type, Linux provides a general function that initializes the parameters whose values stay the same across all models. Each device driver invokes this function in addition to setting those fields that have unique values for its model. Drivers can also overwrite fields that were already initialized by the kernel (for instance, to improve performance). You can find more details in Chapter 8.
网络设备可以用类型来化分,例如:以太网卡,令牌环网卡。对于同一类型的网络设备而言,net_device结构中一些确定的字段的值是一样的,而另一些则会因为不同的设备模型而有所不同。然而,对于大多数类型而言,有一些参数在不同的模型下是一样的,Linux提供一些通用的函数来初始化结构中的这些参数。而一些特殊的对于某一模型来说是唯一的字段,设备驱动可以调用函数来额外的设置这些字段。设备驱动同样可以重载一些内核已经初始化过了的一些字段。

The fields of the net_device structure can be classified into the following categories:
结构中的字段可以分以下几类:

Configuration 配置类

Statistics 统计类

Device status 设备状态类

List management 链表管理类

Traffic management 流量管理

Feature specific 特性

Generic 通用

Function pointers (or VFT) 函数指针

2.2.1. Identifiers
The net_device structure includes three identifiers , not to be confused:
net_device结构有三个ID,不要搞混了:


int ifindex

A unique ID, assigned to each device when it is registered with a call to dev_new_index.
唯一ID,在注册时调用dev_new_index给每个设备唯一分配。

int iflink

This field is mainly used by (virtual) tunnel devices and identifies the real device that will be used to reach the other end of the tunnel.
这一字段主要给(虚拟)隧道设备使用,用于唯一标识真实的设备,该设备将用于到达隧道的另一端。


unsigned short dev_id

Currently used by IPv6 with the zSeries OSA NICs. The field is used to differentiate between virtual instances of the same device that can be shared between different OSes concurrently. See comments in net/ipv6/addrconf.c.
现在在zSeries OSA NIC上被IPv6所使用。

2.2.2. Configuration
Some of the configuration fields are given a default value by the kernel that depends on the class of network device, and some fields are left to the driver to fill. The driver can change defaults, as mentioned earlier, and some fields can even be changed at runtime by commands such as ifconfig and ip. In fact, several parameters base_addr, if_port, dma, and irqare commonly set by the user when the module for the device is loaded. On the other hand, these parameters are not used by virtual devices.
一些配置由内核根据网络设备的类型给出默认值,而另一些则由设备驱动来填写。正如前面所谈到的,驱动可以改写默认值,而且一些配置可以在运行时通过命令行配置,例如ifconfig以及IP。事实上,当设备加载时,一些设备参数是由用户设置的,例如base_addr, if_port, dma, 以及irqare。换句话说,这些参数不被虚拟设备使用。
(译问:为什么?在设备加载时用户配置的信息就不是虚拟设备呢?)

char name[IFNAMSIZ]

Name of the device (e.g., eth0).、设备名,例如eth0


unsigned long mem_start

unsigned long mem_end

These fields describe the shared memory used by the device to communicate with the kernel. They are initialized and accessed only within the device driver; higher layers do not need to care about them.
这两个字段标识设备与内核共享的内存段。它由只由设备驱动程序初始化和访问,更上层的应用不必考虑。

unsigned long base_addr

The beginning of the I/O memory mapped to the device's own memory.
该设备自己内存的I/O内存映射的起始地址。(译注:Linux基础概念的一些东西可以参考另一本书:深入理解Linux内核)

unsigned int irq

The interrupt number used by the device to talk to the kernel. It can be shared among multiple devices. Drivers use the request_irq function to allocate this variable and free_irq to release it.
设备中断号,用于和内核进行交互。对于多设备的情况可以与其它设备共享中断号。驱动程序通过调用request_irq申请一个可用的中断号,并用free_irq来释放它。

unsigned char if_port

The type of port being used for this interface. See the next section, "Interface types and ports."
该网络接口将要使用的端口类型。


unsigned char dma

The DMA channel used by the device (if any). To obtain and release a DMA channel from the kernel, the file kernel/dma.c defines the functions request_dma and free_dma. To enable or disable a DMA channel after obtaining it, the functions enable_dma and disable_dma are provided in various include/asm-architecture files (e.g., include/asm-i386). The routines are used by ISA devices; Peripheral Component Interconnect (PCI) devices do not need them because they use others instead.
如果可能,该字段表示该设备将要使用的DMA(动态内存访问)隧道。为了从内核哪里取得和释放一个DMA隧道,kernel/dma.c文件中定义了request_dma 和free_dma函数。为了使能和去使能DMA隧道,在include/asm-架构文件(例如:include/asm-i386)中定义了enable_dma 和disable_dma 两个函数。这些例程被ISA设备使用,PCI设备因为使用其它的方法而不使用这些。

DMA is not available for all devices because some buses don't use it.
DMA并不被所有的设备所使用。


unsigned short flags

unsigned short gflags

unsigned short priv_flags

Some bits in the flags field represent capabilities of the network device (such as IFF_MULTICAST) and others represent changing status (such as IFF_UP or IFF_RUNNING). You can find the complete list of these flags in include/linux/if.h. The device driver usually sets the capabilities at initialization time, and the status flags are managed by the kernel in response to external events. The settings of the flags can be viewed through the familiar ifconfig command:
这些标记字段中的一些位表示网络设备的容量(例如:IFF_MULTICAST),而另一些则表示改变状态(例如:IFF_UP 或者IFF_RUNNING)。你可以在include/linux/if.h中找到完成的列表。设备驱动通常在初始化时配置能力(译注:这里指网络设备的一些扩展功能,不同的网络设备可能有不同的能力),而且这些状态标记是由内核在响应一些外部事件时来管理。这些配置可以通过类似ifconfig的命令来查看:

bash# ifconfig lo
lo          Link encap:Local Loopback
            inet addr:127.0.0.1  Mask:255.0.0.0
            UP LOOPBACK RUNNING  MTU:3924  Metric:1
            RX packets:198 errors:0 dropped:0 overruns:0 frame:0
            TX packets:198 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0


In this example, the words UP LOOPBACK RUNNING correspond to the flags IFF_UP, IFF_LOOPBACK, and IFF_RUNNING.

priv_flags stores flags that are not visible to the user space. Right now this field is used by the VLAN and Bridge virtual devices. gflags is almost never used and is there for compatibility reasons. Flags can be changed through the dev_change_flags function.
priv_flags保存的一些标记对用户空间是不可见的。现在,这些字段被网桥和VLAN这些虚拟设备所使用。gflags几乎从来不用,它的存在只是一些兼容的原因。可以通过dev_change_flags 函数来修改状态标记。


int features

Another bitmap of flags used to store some other device capabilities. It is not redundant for this data structure to contain multiple flag variables. The features field reports the card's capabilities for communicating with the CPU, such as whether the card can do DMA to high memory, or checksum all the packets in hardware. The list of the possible features is defined inside the structure net_device itself. This parameter is initialized by the device driver. You can find the list of NETIF_F_XXX features, along with good comments, inside the net_device data structure definition.
这是另一个位域(bitmap )标记,用于存储其它一些设备的能力。对于该字段来说,用于包含一些多用途的标记变量,它并不是多余的。该特性字段用于和CPU通信,报告网卡的能力,例如该网卡是否可以在高端内存上处理DMA,或者在硬件上对包进行检验和验证。这些可能具备的特性就定义在net_device这个结构中。该特性字段由设备驱动来初始化。在net_device 数据结构内部,你可以看到类似 NETIF_F_XXX 的特性列表,还有一些很好的注释。


unsigned mtu

MTU stands for Maximum Transmission Unit and it represents the maximum size of the frames that the device can handle. Table 2-1 shows the values for the most common network technologies.
MTU是为最大传输单元而设计的,它表示设备可以处理的帧中最大的字节数。表2-1展示了一些最常用的网络设备技术指标:

Table 2-1. MTU values for different device types Device type
 MTU
 
PPP 296  (端对端协议设备)
 
SLIP 296 (串行线路接口协议)
 
Ethernet 1,500 (以太网协议)
 
ISDN 1,500
 
PLIP 1,500 (ether_setup)
 
Wavelan 1,500 (ether_setup)
 
EtherChannel 2,024
 
FDDI 4,352
 
Token Ring 4 MB/s (IEEE 802.5) 4,464
 
Token Bus (IEEE 802.4) 8,182
 
Token Ring 16 MB/s (IBM) 17,914
 
Hyperchannel 65,535
 
The Ethernet MTU deserves a little clarification. The Ethernet frame specification defines the maximum payload size as 1,500 bytes. Sometimes you find the Ethernet MTU defined as 1,518 or 1,514: the first is the maximum size of an Ethernet frame including the header, and the second includes the header but not the frame check sequence (4 bytes of checksum).
应该对MTU澄清一下,以太帧明确定义的最大净载为1500字节,但有些时候你看到以太网的MTU定义为1518或者1514,前面一个是包括帧头的最大以太网帧字节,第二个则是包含了头但不包括校验和(4字节)

In 1998, Alteon Networks (acquired by Nortel Networks in 2000) promoted an initiative to increase the maximum payload of Ethernet frames to 9 KB. This proposal was later formalized with an IETF Internet draft, but the IEEE never accepted it. Frames exceeding the 1,500 bytes of payload in the IEEE specification are commonly called jumbo frames and are used with Gigabit Ethernet to increase throughput. This is because bigger frames mean fewer frames for large data transfers, fewer interrupts, and therefore less CPU usage, less header overhead, etc.). For a discussion of the benefits of increasing the Ethernet MTU and why IEEE does not agree on standardizing this extension, you can read the white paper "Use of Extended Frame Sizes in Ethernet Networks" that can be found with an Internet search, as well as at http://www.ietf.org/proceedings/01aug/I-D/draft-ietf-isis-ext-eth-01.txt.
1998年,Alteon Networks(Nortel Networks在2000年取得)promoted an initiative,将以太帧的最大净载提升到9KB。这一提议最后在IETF的网络草案中得到定型,但IEEE决不接受它。当帧的净载超过1500字节时,在IEEE中就明确说明,这些帧称为巨型帧(jumbo frames),而且这要求使用Gigabit以太网(译注:GE指Gigabit Ethernet,它的传输量为1000MBPS,100MBPS的为FE,Fast Ethernet)来增加传输。


unsigned short type

The category of devices to which it belongs (Ethernet, Frame Relay, etc.). include/linux/if_arp.h contains the complete list of possible types.
设备类型,在include/linux/if_arp.h 中有完整定义


unsigned short hard_header_len

The size of the device header in octets. The Ethernet header, for instance, is 14 octets long. The length of each device header is defined in the header file for that device. For Ethernet, for instance, ETH_HLEN is defined in <include/linux/if_ether.h>.
根据设备类型所产生的链路层的帧头的长度(以字节为单位),以太类型为14字节
因为不同的设备类型所产生的帧头长度是不一样的。


unsigned char broadcast[MAX_ADDR_LEN]

The link layer broadcast address.

unsigned char dev_addr[MAX_ADDR_LEN]

unsigned char addr_len

dev_addr is the device link layer address; do not confuse it with the L3 or IP address. The address's length in octets is given by addr_len. The value of addr_len depends on the type of device. Ethernet addresses are 8 octets long.
链路层地址,长度有addr_len给出,也是与设备设备相关的,以太类型为8字节(?,我认为是6字节)

int promiscuity

See the later section "Promiscuous mode."参考后面的混合模式

2.2.2.1. Interface types and ports (网卡)接口类型和端口
Some devices come with more than one connector (the most common combination is BNC + RJ45) and allow the user to select one of them depending on her needs. This parameter is used to set the port type for the device. When the device driver is not forced by configuration commands to select a specific port type, it simply chooses a default one. There are also cases where a single device driver can handle different interface models; in those situations, the interface can discover the port type to use by simply trying all of them in a specific order. This piece of code shows how one device driver sets the interface mode depending on how it has been configured:
有些设备有多链接口,用户可以根据自己的需求选择一个。这个参数(if_port)是用于配置端口类型,如果不是用命令行强制配置一个特殊的端口,一般会选择一个默认端口。也有情况下,一个单一的设备驱动程序可以处理不同接口模式;这时,接口可以通它们指定的顺序来简单的发现端口类型,这一段代码显示了一个设备驱动程序如何根据它的配置来设置接口方式:

        switch (dev->if_port) {
        case IF_PORT_10BASE2:
            writeb((readb(addr) & 0xf8) | 1, addr);
            break;
        case IF_PORT_10BASET:
            writeb((readb(addr) & 0xf8), addr);
            break;
        }

 

2.2.2.2. Promiscuous mode 混合模式
Certain network administration tasks require a system to receive all the frames that travel across a shared cable, not just the ones directly addressed to it; a device that receives all packets is said to be in promiscuous mode . This mode is needed, for instance, by applications that check performance or security breaches on their local network segment. Promiscuous mode is also used by bridging code (see Part IV). Finally, it has obvious value to malicious snoopers, unfortunately; for this reason, no data is secure from other users on a local network unless it is encrypted.
有些确定的网络管理员的任务要求系统可以接收穿过共享电缆(也就是网络)上的所有帧,不仅仅是单向的发给它自己的;这时,当一个设备要求接收所有的包时,就要求配置成混合模式。这个模式是必须的,例如,一些在本地网络上检测性能或者安全漏洞的应用程序。不幸的是,它最终一个明确的值用来配置给恶意的窥探者(也就是网络监听程序)。正因为如此,在本地网络上,从一个用户到另一个用户的数据是没有安全性可言的,除非加密。

The net_device structure contains a counter named promiscuity that indicates a device is in promiscuous mode. The reason it is a counter rather than a simple flag is that several clients may ask for promiscuous mode; therefore, each increments the counter when entering the mode and decrements the counter when leaving the mode. The device does not leave promiscuous mode until the counter reaches zero. Usually the field is manipulated by calling the function dev_set_promiscuity.
net_device有一个名为promiscuity 的计数器,它用于标识一个设备是否在混合模式下。使用一个计数器而不用一个简单的标记的原因是,可能有几个用户时同请求使用混合模式。这样,每一个该模式时,就增加一下计数器,离开这种模式时就减一个计数器。直到计数器为0时,设备就离开混合模式。通常为,这个字段通过dev_set_promiscuity函数来操作。

Whenever promiscuity is nonzero (such as through a call to dev_set_promiscuity), the IFF_PROMISC bit flag of flags is also set and is checked by the functions that configure the interface.
只要promiscuity 不为0,IFF_PROMISC 位标志就被设置,而且配置接口的函数要检测该位。

The following piece of code, taken from the drivers/net/3c59x.c driver, shows how the different receive modes are set based on the flags (bits) in the flags field:
下面的代码演示了基于不同的标志位来配置不同的接收模式:
代码从drivers/net/3c59x.c中取出

static void set_rx_mode(struct net_device *dev)
{
        int ioaddr = dev->base_addr;
        int new_mode;

        if (dev->flags & IFF_PROMISC) {
             if (corqscreq_debug > 3)
                        printk("%s: Setting promiscuous mode.\n", dev->name);
             new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast | RxProm;
        } else if ((dev->mc_list)  ||  (dev->flags & IFF_ALLMULTI)) {
             new_mode = SetRxFilter | RxStation | RxMulticast | RxBroadcast;
        } else
             new_mode = SetRxFilter | RxStation | RxBroadcast;

        outw(new_mode, ioaddr + EL3_CMD);
}


When the IFF_PROMISC flag is set, the new_mode variable is initialized to accept the traffic addressed to the card (RxStation), multicast traffic (RxMulticast), broadcast traffic (RxBroadcast), and all the other traffic (RxProm). EL3_CMD is the offset to the ioaddr memory address that represents where commands are supposed to be copied when interacting with the device.
当IFF_PROMISC 标志位被设置时,new_mode变量就被初始化,从而可以让网卡接收数据流,多播数据流,广播数据流,以及所有其它数据流。EL3_CMD 是一个ioaddr 的内存地址偏移,它用于表示用于设备交互时,什么在支持命令的copy。(不解??)

2.2.3. Statistics 统计
Instead of providing a collection of fields to keep statistics , the net_device structure includes a pointer named priv that is set by the driver to point to a private data structure storing information about the interface. The private data consists of statistics such as the number of packets transmitted and received and the number of errors encountered.
不用统一的集合来保存统计信息,而是在net_device结构里包含了一个priv指针用于指向该设备的私有数据结构,该数据结构中保存了接口的一些信息。而这些私有信息中有一些类似于包发送、接收以及错误的统计数据。

The format of the structure pointed at by priv depends both on the device type and on the particular model: thus, different Ethernet cards may use different private structures. However, nearly all structures include a field of type net_device_stats (defined in include/linux/netdevice.h) that contains statistics common to all the network devices and that can be retrieved with the method get_stats, described later.
priv所指指向的数据结构格式同时取决于设备类型和实际的工作模式:也就是说,不同的以太卡可能使用不同的私有数据结构。然而,几乎所有的结构都包含一个net_device_stats类型的成员(定义在include/linux/netdevice.h中),它包含了所有通用网络设备的统计信息,而且它可以通过get_stats方法取得。稍后说明。

Wireless devices behave so differently from wired devices that wireless ones do not find the net_device_stats data structure appropriate. Instead, they provide a field of type iw_statistics that can be retrieved using a method called get_wireless_stats, described later.
无线设备的表现方式与有线的有一些不同,即无线设备里找不net_device_stats数据结构。取而代之的是,它们提供了一个iw_statistics类型的字段,该字段可以通过get_wireless_stats 方法取得统计信息。稍后说明。

The data structure to which priv points sometimes has a name reflecting the interface (e.g., vortex_private for the Vortex and Boomerang series, also called the 3c59x family), and other times is simply called net_local. Still, the fields in net_local are defined uniquely by each device driver.
用priv指向的数据结构有时有一个到接口的引用名(例如:vortex_private引用到Vortex和Boomerang系列,同样是3c59x家族的),而另一些时候,它们就简单的称作net_local。尽管这样,在net_local里的字段还是被每个设备所唯一指定。

The private data structure may be more or less complex depending on the card's capabilities and on how much the device driver writer is willing to employ sophisticated statistics and complex design to enhance performance. Compare, for instance, the generic net_local structure used by the 3c507 Ethernet card in drivers/net/3c507.c with the highly detailed vortex_private structure used by the 3c59x Ethernet card in drivers/net/3c59x.c. Both, however, include a field of type net_device_stats.
根据网卡的能力不同,私有数据类型结构的复杂情种可能或多或少有些不同,而且还根据设备驱动程序的作者,对统计的精密度、复杂程度和性能的要求。例如,与通用的net_local结构相比,在drivers/net/3c507.c中有一个3c507以太类型的网卡驱动。

As you will see in Chapter 8, the private data structure is sometimes appended to the net_device structure itself (requiring only one malloc for both) and sometimes allocated as a separate block.
在第8章中讲私有数据的内存分配。

2.2.4. Device Status 设备状态
To control interactions with the NIC, each device driver has to maintain information such as timestamps and flags indicating what kind of behavior the interface requires. In a symmetric multiprocessing (SMP) system, the kernel also has to make sure that concurrent accesses to the same device from different CPUs are handled correctly. Several fields of the net_device structure are dedicated to these types of information:
为了控制和NIC交互,每个设备驱动都有一个维护信息,例如很多接口都要使用的时间戳。在对称多处理系统(SMP)中,内核同样会确保,在不同CPU上同步访问相同的设备数据时能被正确的处理。net_device有很多字段要关注的字段:

unsigned long state

A set of flags used by the network queuing subsystem. They are indexed by the constants in the enum netdev_state_t, which is defined in include/linux/netdevice.h and defines constants such as _ _LINK_STATE_XOFF for each bit. Individual bits are set and cleared using the general functions set_bit and clear_bit, usually invoked through a wrapper that hides the details of the bit used. For example, to stop a device queue, the subsystem invokes netif_stop_queue, which looks like this:
一个用于网络队列子系统的标志集。它们被枚举结构netdev_state_t索引过,该结构定义在include/linux/netdevice.h 中,其它还定义了类似_ _LINK_STATE_XOFF的常量。个别的位可以用通用的函数set_bit和clear_bit来处理。通常会通过一个封装来屏蔽位的细节。例如,让一个设备队列停止,子系统就调用netif_stop_queue,看上去就是:

static inline void netif_stop_queue(struct net_device *dev)
{
    ...
    set_bit(_ _LINK_STATE_XOFF, &dev->state);
}

 

The Traffic Control subsystem is briefly introduced in Chapter 11.
流量控制子系统在第11章中简单的介绍。


enum {...} reg_state

The registration state of the device. See Chapter 8.
设备注册状态,在第8章中说明。


unsigned long trans_start

The time (measured in jiffies) when the last frame transmission started. The device driver sets it just before starting transmission. The field is used to detect problems with the card if it does not finish transmission after a given amount of time. An overly long transmission means there is something wrong; in that case, the driver usually resets the card.
时间(用jiffy作单位),用于表示最后一帧是从什么开始发送的。


unsigned long last_rx

The time (measured in jiffies) when the last packet was received. At the moment, it is not used for any specific purpose, but is available in case of need.
最后一帧收到的时间。


struct net_device *master

Some protocols exist that allow a set of devices to be grouped together and be treated as a single device. These protocols include EQL (Equalizer Load-balancer for serial network interfaces), Bonding (also called EtherChannel and trunking), and the TEQL (true equalizer) queuing discipline of Traffic Control. One of the devices in the group is elected to be the so-called master, which plays a special role. This field is a pointer to the net_device data structure of the master device of the group. If the interface is not a member of such a group, the pointer is simply NULL.
一些存在的协议可以让一些设备组合在一起,然后当成一个设备来看待。这些协议包括EQL(Equalizer Load-balancer for serial network interfaces),Bonding(也称为以太通道或者trunking),和TEQL(true equalizer)队列流量控制原则。如果这组网络设备中,有一个被选中的设备起着特殊的角色作用,就叫称为主设备。这个字段就指向这个组中的主设备的net_device 数据结构。如果当前不在任何组中,这个就是空。

spinlock_t xmit_lock

int xmit_lock_owner

The xmit_lock lock is used to serialize accesses to the driver function hard_start_xmit. This means that each CPU can carry out only one transmission at a time on any given device. xmit_lock_owner is the ID of the CPU that holds the lock. It is always 0 on single-processor systems and -1 when the lock is not taken on SMP systems. It is possible to have lockless transmissions, too, when the device driver supports it. See Chapter 11 for both the lock and the lockless cases.
xmit_lock用于串行化驱动函数hard_start_xmit的访问。也就是说,每个CPU只能在同一时间计算同一个指定设备的传输数据。xmit_lock_owner就是占有锁的CPU的ID。在单处理器上,它总是0,而在SMP不被支持时,它就是-1。同样,当设备支持它时,也可以表示无锁的传输。加锁和不加锁的情况,参考第11章。

void *atalk_ptr

void *ip_ptr

void *dn_ptr

void *ip6_ptr

void *ec_ptr

void *ax25_ptr

These six fields are pointers to data structures specific to particular protocols, each data structure containing parameters that are used privately by that protocol. ip_ptr, for instance, points to a data structure of type in_device (even though it is declared as void *) that contains different IPv4-related parameters, among them the list of IP addresses configured on the interface (see Chapter 19). Other sections of this book describe the fields of the data structures used by protocols covered in the book. Most of the time only one of these fields is in use.
这6个字段根据实际的协议,指向明确的数据结构。每个数据结构包含了该协议可以使用私有数据的参数,例如,指向in_device的数据结构(定义为void *)就包含不同的与IPv4相关参数,这些就包含了接口上配置的IP地址列表 (参见第19章)。本书的其它一些章节会讲述这些字段。很多时候,这些字段中只有一个起作用。

2.2.5. List Management 列表管理
net_device data structures are inserted into a global list and into two hash tables, as described in Chapter 8. The following fields are used to accomplish these tasks:
net_device数据结构存储在全局的2个Hash表中,在第8章中讲解。下面的字段用于完成这些相关的任务:

struct net_device *next

Links each net_device data structure to the next in the global list.
在全局列表中链接到下一个net_device 设备。

struct hlist_node name_hlist

struct hlist_node index_hlist

Link the net_device structure to the bucket's list of two hash tables.
Hash表中的2个桶链表(名字和索引)。

2.2.6. Link Layer Multicast 链路层的多播
Multicast is a mechanism used to deliver data to multiple recipients. Multicasting can be available both at the L3 network layer (i.e., IP) and at the L2 link layer (i.e., Ethernet). In this section, we are concerned with the latter.
多播是种将一份数据复制并分发到多个接收者那里的一种机制。多播可以同时在二层和三层上使用。我们现在设计二层的多播。

Link layer multicast delivery can be achieved by using special addresses or control information in the link layer header. (When it is not supported by the link layer protocol, it may be emulated.) Ethernet natively supports multicasting: we will see in Chapter 13 how an Ethernet address can be classified as unicast, multicast, or broadcast.
链路层的多播发送可以归为使用特殊的地址,或者控制链路层的帧头信息。(当它不被链路层支持时,它可以这样来模拟) 以太网天生就支持多播:我们在第13章中会看到,以及以太网是如何区分单播、多播、和广播地址的。

Multicast addresses are distinguished from the range of other addresses by a specific bit. This means that 50% of the possible addresses are multicast, and 50% of 248 is a huge number! When an interface is asked to join a lot of multicast groups (each identified by a multicast address), it may be more efficient and faster for it to simply listen to all the multicast addresses instead of maintaining a long list and wasting time filtering ingress L2 multicast frames based on the list. One of the flags in the net_device data structure indicates whether the device should listen to all addresses. The decision about when to set or clear this flag is controlled by the all_multi field shown in this section.
多播地址通过一个特殊的位来与其它的地址来区分的。这就是说,有50%的地址是多播地址,2^48的50%是一个很大的数字。当一个接口被要求加入到多个多播组时(每一个组通过一个多播地址来区分),它在二层上处理一些组播帧时可能会更高效更快速一些。因这它不用再维护一个列表,基于这个列表来过滤一些不关注的二层帧。在net_device数据结构中有一个字段来指示设备是否要在所有地址上监听。通过控制all_multi字段来如何决定设置的清除这个标志。

Each device keeps an instance of the dev_mc_list structure for each link layer multicast address it listens to. Link layer multicast addresses can be added and removed with the functions dev_mc_add and dev_mc_delete, respectively. Relevant fields in the net-device structure include:
每个设备为每一个要监听的链路层的多播地址保存有一个dev_mc_list数据结构的实例。可以通过dev_mc_add和dev_mc_delete函数分别来添加和删除多播地址。net_device 相关的数据字段:

struct dev_mc_list *mc_list

Pointer to the head of this device's list of dev_mc_list structures.
指向设备的dev_mc_list结构链表头。

int mc_count

The number of multicast addresses for this device, which is also the length of the list to which mc_list points.
多播数目。也是mc_list 长度。

int allmulti

When nonzero, causes the device to listen to all multicast addresses. Like promiscuity, discussed earlier in this chapter, allmulti is a reference count rather than a simple Boolean. This is because multiple facilities (VLANs and bonding devices, for instance) may independently require listening to all addresses. When the variable goes from 0 to nonzero, the function dev_set_allmulti is called to instruct the interface to listen to all multicast addresses. The opposite happens when allmulti goes to 0.
非0时,让设备监听所有多播地址。就像这一章前面提过的混合模式,allmulti是一个引用计数,而不是简单的布尔值。这是因为,multiple facilities(例如VLAN和bonding设备)可能要独立监听所有的地址。

2.2.7. Traffic Management 流量控制
The Traffic Control subsystem of Linux has grown quite a lot and represents one of the strengths of the Linux kernel. The associated kernel option is "Device drivers  Networking support  Networking options  QoS and/or fair queueing." Relevant fields in the net-device structure include:
Linux 的流量控制子系统发展的很快,而且是做为Linux内核中的重点之一。内核相关的选项“Device drivers  Networking support  Networking options  QoS and/or fair queueing”与下面这些结构相关:

struct net_device *next_sched

Used by one of the software interrupts described in Chapter 11.

struct Qdisc *qdisc

struct Qdisc *qdisc_sleeping

struct Qdisc *qdisc_ingress

struct list_head qdisc_list

These fields are used to manage the ingress and egress packet queues and access to the device from different CPUs.
这些字段用于管理进入和出去的包队列,以及不同的CPU来访问这些设备。

spinlock_t queue_lock

spinlock_t ingress_lock

The Traffic Control subsystem defines a private egress queue for each network device. queue_lock is used to avoid simultaneous accesses to it (see Chapter 11). ingress_lock does the same for ingress traffic.
流量控制子系统为每个网络设备定义了一个私有队列。queue_lock 用于避免同时访问它们(参见第11章)。ingress_lock 在进入流量控制上起同样的作用。

unsigned long tx_queue_len

The length of the device's transmission queue. When Traffic Control support is present in the kernel, tx_queue_len may not be used (only a few queuing discipline use it). Table 2-2 shows the values used for the most common device types. Its value can be tuned with the sysfs filesystem (see the /sys/class/net/device_name/ directories).
设备传输队列的长度。当流量控制支持在内核中压制时,tx_queue_len 或者不被使用(只会少数入队原则要使用它)。表2-2显示了一些常用的设备类型使用的值,这些值可以和sysfs系统一起调整(参见 /sys/class/net/device_name/directories):

Table 2-2. tx_queue_len values for different device types Device type
 tx_queue_len
 
Ethernet 1,000
 
Token Ring 100
 
EtherChannel 100
 
Fibre Channel 100
 
FDDI 100
 
TEQL (true link equalizer)a 100
 
ISDN 30
 
HIPPI 25
 
PLIP 10
 
SLIP 10
 
AX25 10
 
EQL (Equalizer load balancer for serial network interfaces) 5
 
Generic PPP 3
 
Bonding 0
 
Loopback 0
 
Bridge 0
 
VLAN 0
 
a TEQL is one of the queuing disciplines you can configure with Traffic Control (the QoS layer).
TEQL是你可以配置流量控制的队列原则之一。

Depending on the queuing discipline the strategy used to queue incoming and outgoing packets in use, tx_queue_len may or may not be used. It is usually used when the queue type is FIFO (First In, First Out) or something else relatively simple.
根据入队原则,策略在包入队和出队时会用到,而tx_queue_len可能用到可能不用。它经常在FIFO队列中使用,或者一些相似的简单情况。

Note that all devices with a queue length of 0 are virtual devices: they rely on the associated real devices to do any queuing (with the exception of the loopback device, which does not need it because it is internal to the kernel and delivers all traffic immediately).
注意到,所有的队列长度为0的设备是虚拟设备:它们依赖于相关的真实设备来做一些队列处理(做为一个例外就是环回设备,这种设备不须要队列,因为它是在内核内部使用的,而且所有的流量都是立即发送的)。

2.2.8. Feature Specific 特性
As we saw when describing sk_buff, a few parameters are included in the definition of net_device only if the features they belong to have been included in the kernel:[]
如我们所看到的,当我们讨论sk_buff时,一些参数只有在特性参数在内核中包含时,他们才有效:

[] The fields are actually included only when the associated feature is part of the kernel. See, for example, br_port.
这些字段只有在一些特性关联到内核中时才真正的有效。例如br_port.

struct divert_blk *divert

Diverter is a feature that allows you to change the source and destination addresses of the incoming packet. This makes it possible to reroute traffic with specific characteristics specified by the configuration to a different interface or a different host. To work properly and to make sense, diverter needs other features such as bridging. The data structure pointed to by this field stores the parameters needed by the diverter feature. The associated kernel option is "Device drivers  Networking support  Networking options  Frame Diverter."
分流调节器(Diverter)是一个充许你改变进入包的源地址和目的地址的特性。这样通过配置不同的接口或者主机,就让标识有特殊标识的流量重新路成为可能,

struct net_bridge_port *br_port

Extra information needed when the device is configured as a bridged port. The bridging code and the Spanning Tree Protocol (STP) are covered in Part IV. The associated kernel option is "Device drivers  Networking support  Networking options  802.1d Ethernet Bridging."
当设备被配置成网桥时要求额外的信息。关于桥椄和STP的内容在第四部份中会讲到。

void (*vlan_rx_register)(...)

void (*vlan_rx_add_vid)(...)

void (*vlan_rx_kill_vid)(...)

These three function pointers are used by the VLAN code to register a device as VLAN tagging capable (see net/8021q/vlan.c), add a VLAN to the device, and delete the VLAN from the device, respectively. The associated kernel option is "Device drivers  Networking support  Networking options  802.1Q VLAN Support."
这三个函数指针用于让一个设备注册成且有VLAN tagging的能力(net/8021q/vlan.c)。分别是在设备上添加一个VLAN,以设备上删除一个VLAN。

int netpoll_rx

void (*poll_controller)(...)

Used by the optional Netpoll feature that is briefly mentioned in Chapter 10.

2.2.9. Generic 通用
In addition to the list management fields of the net_device structure discussed earlier, a few other fields are used to manage structures and make sure they are removed when they are not needed:
...,其它一些字段用于管理数据结构,以便这些数据在不使用的时候被删除。

atomic_t refcnt

Reference count. The device cannot be unregistered until this counter has gone to zero (see Chapter 8).
引用计数,当它为0的时候表示该设备可以被反注册。

int watchdog_timeo

struct timer_list watchdog_timer

Along with the tx_timeout variable discussed earlier, these fields implement the timer discussed in the section "Watchdog timer" in Chapter 11.


int (*poll)(...)

struct list_head poll_list

int quota

int weight

Used by the NAPI feature described in Chapter 10.

const struct iw_handler_def *wireless_handlers

struct iw_public_data *wireless_data

Additional parameters and function pointers used by wireless devices. See also get_wireless_stats.
额外的用于无线设备的参数。

struct list_head todo_list

The registration and unregistration of a network device is done in two steps. todo_list is used to handle the second one. See Chapter 8.
注册的反注册一个网络设备要用2个步骤完成,todo_list用于处理第2步,参见第8章

struct class_device class_dev

Used by the new generic kernel driver infrastructure.
用于新的通用内核基础驱动。

2.2.10. Function Pointers 函数指针
We saw in Chapter 1 that the networking code makes heavy use of function pointers . The net_device data structure includes quite a few of them. Such functions are used mainly to:
如第1章所见,内核中大量使用了函数指针。net_device 数据结构中就有很多。

Transmit and receive a frame
发送和接收一个帧

Add or parse the link layer header on a buffer
添加或者解析链路层的缓存头

Change a part of the configuration
修改部份的配置

Retrieve statistics
重新取得统计

Interact with a specific feature
内部特殊功能

A few function pointers were already introduced in the previous sections when describing the fields used to accomplish a specific task. Here are the generic ones:
一些函数指针在前面的几节中已经介绍了。下有一些通用的:

struct ethtool_ops *ethtool_ops

Pointer to a set of function pointers used to set or get the configuration of different device parameters. See the section "Ethtool" in Chapter 8.
一个指向函数集的指针,用于设置或者取得不同设备的配置参数。

int (*init)(...)

void (*uninit)(...)

void (*destructor)(...)

int (*open)(...)

int (*stop)(...)

Used to initialize, clean up, destroy, enable, and disable a device. Not all of them are always used. See Chapter 8.
用于初始化,清除,使能,去使能一个设备。并不是所有的都使用。

struct net_device_stats* (*get_stats)(...)

struct iw_statistics* (*get_wireless_stats)(...)

Some statistics collected by the device driver can be displayed with user-space applications such as ifconfig and ip, and others are strictly used by the kernel and are discussed in the section "Device Status" earlier in this chapter. These two methods are used to collect statistics. get_stats operates on a normal device and get_wireless_stats on a wireless device. See also the earlier section "Statistics."
一些统计被设备驱动集中起来,可以在用户空间中的应用程序被显示。例如:ifconfig和ip,以及其它一些在内核更严格地使用。我们在前面的"Device Status"中介绍过。这两个函数用于集合的统计。get_stats在一个普通设备上操作,而get_wireless_stats 在一个无线设备上操作。

int (*hard_start_xmit)(...)

Used to transmit a frame. See Chapter 11.

int (*hard_header)(...)

int (*rebuild_header)(...)

int (*hard_header_cache)(...)

void (*header_cache_update)(...)

int (*hard_header_parse)(...)

int (*neigh_setup)(...)

Used by the neighboring layer. See the sections "Methods Provided by the Device Driver" and "Neighbor Initialization" in Chapter 27.
这些在neighboring层中使用。参见第27章的"Methods Provided by the Device Driver" and "Neighbor Initialization"

int (*do_ioctl)(...)

ioctl is the system call used to issue commands to devices (see Chapter 3). This method is called to process some of the ioctl commands (see Chapter 8).
ioctl是一个系统调用(译注:在不同的CPU体系结构中,系统调用的实现不一样,i386中是使用的int 80中断),用于向设备发出命令(参见第13章)。这个方法用于处理一些ioctl命令。参见第8章

void (*set_multicast_list)(...)

We have already seen in the section "Link Layer Multicast" that mc_list and mc_count are used to manage the list of L2 multicast addresses. This method is used to ask the device driver to configure the device to listen to those addresses. Usually it is not called directly, but through wrappers such as dev_mc_upload or its lockless version, _ _dev_mc_upload. When a device cannot install a list of multicast addresses, it simply enables all of them.
我们已经在“Link Layer Multicast”解了一下,mc_list 和mc_count 用于管理二层多播地址。这个方法用于要求设备驱动配置,让它监听一些地址。通常它不直接调用,而是调用类似dev_mc_upload的封装,或者是无锁版本的_ _dev_mc_upload。当一个设备不能安装多播地址链表时,它会简单的让所有的都使能。

int (*set_mac_address)(...)

Changes the device MAC address. When the device does not provide this capability (as in the case of Bridge virtual devices), it is set to NULL.
修改设备MAC地址。当设备不支持这一功能时(例如在虚拟的网桥设备中),它设置为NULL

int (*set_config)(...)

Configures driver parameters, such as the hardware parameters irq, io_addr, and if_port. Higher-layer parameters (such as protocol addresses) are handled by do_ioctl. Not many devices use this method, especially among the new devices that are better able to implement probe functions. A good example with some documentation can be found in sis900_set_config in drivers/net/sis900.c.
配置驱动参数,例如硬件IRQ参数,io_addr, 以及if_port。更高层的参数(例如协议定位)是由do_ioctl来处理的。并没有很多设备使用这个方法,特别是对于新设备,更好的方法是实现probe函数。在drivers/net/sis900.c的sis900_set_config中可以看到一个很好例子和文档。

int (*change_mtu)(...)

Changes the device MTU (see the description of mtu in the earlier section, "Configuration"). Changing this field has no effect on the device driver but simply forces the kernel software to respect the new MTU and to handle fragmentation accordingly.
修改设备的MTU。修改这一字段并不会对设备驱动有什么影响。只是简单的强制内核在软件上关注一下新的MTU,根据它来分片。

void (*tx_timeout)(...)

The method invoked at the expiration of the watchdog timer, which determines whether a transmission is taking a suspiciously long time to complete. The watchdog timer is not even started unless this method is defined. See the section "Watchdog timer" in Chapter 11 for more information.
在“看门狗”(watchdog)到期时会调用该方法,这取决于为了完成传输是否占用了可疑的长时间。看门狗的计时器只有在这一方法被定义时才启动。

int (*accept_fastpath)(...)

Fast switching (also called FASTROUTE) was a kernel feature that allowed device drivers to route incoming traffic during interrupt context using a small cache (bypassing all the software layers). Fast switching is no longer supported, starting with the 2.6.8 kernel. This method was used to test whether the fast-switching feature could be used on the device.
 快速切换是内核的一个特性,它充许设备驱动在对进入流量的中断上下文中,使用一个小的缓存(经过所有的软件层)。快速切换在2.6.8内核中不再被支持。这一方法用于检测设备是否支持快速切换。 


======
翻译总结:
最近时间确实很紧,每天能有半个小时都算不错了。这一篇借周末快速翻译完了。
后期的翻译可能会越来越简单,一方面是有些东西确实前后都在讲解,做为学习(不是以翻译为目的),看一下就行了,不用反复的翻译。

posted on 2008-11-30 23:09  Wu.Country@侠缘  阅读(1331)  评论(0编辑  收藏  举报