第6章 超越基本的套接字编程

 

6.1 Socket Options

The functions getsockopt() and setsockopt() allow socket option values to be queried and set, respectively.

int getsockopt(int socket, int level, int optName, void* optVal, socklen_t* optLen)
int setsockopt(int socket, int level, int optName, const void* optVal, socklen_t optLen)
The second parameter indicates the level of the option in question.

The option itself is specified by the integer optName, which is always specified using a system-defined constant.

The parameter optVal is a pointer to a buffer.

optLen specifies the length of the buffer.

Note that value passed to setsockopt() is not guaranteed to be the new size of the socket buffer, even if the call apparently succeeds. Rather, it is best thought of as a “hint” to the system about the value desired by the user; the system, after all, has to manage resources for all users and may consider other factors in adjusting buffer size.

 

6.2 Signals

Signals provide a mechanism for notifying programs that certain events have occurred.

Any program that sends on a TCP socket must explicitly deal with SIGPIPE in order to be robust.

在TCP套接字上发送数据的任何程序都必须显示处理SIGPIPE以便保证健壮性。

 

An application program can change the default behavior for a particular signal using sigaction():
int sigaction (int whichSignal, const struct sigaction* newAction, struct sigaction* oldAction)
return 0 on success and -1 on failure.whichSignal specifies the signal for which the behavior is being changed. The newAction parameter points to a structure that defines the new behavior for the given signal type; if the pointer oldAction is non-null, a structure describing the previous behavior for the given signal is copied into it. struct sigaction {
  void (*sa_handler)(int); // Signal handler
  sigset_t sa_mask; // Signals to be blocked during handler execution
  int sa_flags; // Flags to modify default behavior
};

The sigaction() mechanism allows some signals to be temporarily blocked (in addition to those that are already blocked by the process’s signal mask) while the specified signal is handled.By default whichSignal is always blocked regardless of whether it is reflected in sa_mask.(On some systems, setting the flag sa_nodefer in sa_flags allows the specified signal to be delivered while it is being handled.)

sa_mask is implemented as a set of boolean flags, one for each type of signal. This set of flags can be manipulated with the following four functions:
int sigemptyset(sigset_t* set)
int sigfillset(sigset_t* set)
int sigaddset(sigset_t* set, int whichSignal)
int sigdelset(sigset_t* set, int whichSignal)

It is important to realize that signals are not queued—a signal is either pending or it is not. If the same signal is delivered more than once while it is being handled, the handler is only executed once more after it completes the original execution. 

重要的一点是认识到信号是不会被排队的,信号要么挂起(pending),要么不是。如果在处理信号时,相同的信号被递送多次,那么处理程序完成原始执行之后它只会执行一次。

Consider the case where three SIGINT signals arrive while the signal handler for SIGINT is already executing. The first of the three SIGINT signals is blocked; however, the subsequent two signals are lost. When the SIGINT signal handler function completes, the system executes the handler only once again.

One of the most important aspects of signals relates to the sockets interface. If a signal is delivered while the program is blocked in a socket call (such as a recv() or connect()), and a handler for that signal has been specified, as soon as the handler completes, the socket call will return −1 with errno set to EINTR.Thus, your programs that catch and handle signals need to be prepared for these erroneous returns from system calls that can block.

How does the server find out that the connection is broken?  The answer is that it doesn’t, until it tries to send or receive on the socket. If it tries to receive first, the call returns 0. If it tries to send first, at that point, SIGPIPE is delivered. Thus, SIGPIPE is delivered synchronously  and not asynchronously. 

This fact is especially significant for servers because the default behavior for SIGPIPE is to  terminate  the  program.  Thus,  servers  that  don’t  change  this  behavior  can  be  terminated by misbehaving clients.  Servers should always handle SIGPIPE  so that they can detect the client’s disappearance and reclaim any resources that were in use to service it.

 

6.3 Nonblocking I/O 

6.3.1 Nonblocking Sockets

In the case of failure, we need the ability to distinguish between failure due to blocking and other types of failures. If the failure occurred because the call would have blocked, the systemsets errno to EWOULDBLOCK, except for connect(), which returns an errno of EINPROGRESS.

We can change the default blocking behavior with a call to fcntl() (“file control”).
int fcntl(int socket, int command,…)

The operation to be performed is given by command, which is always a system-defined constant.

Determining when the connection is complete is beyond the scope of this text(Connection completion can be detected using the select() call, described in Section 6.5.), so we recommend not setting the socket to nonblocking until after the call to connect().

For eliminating blocking during individual send and receive operations, an alternative is available on some platforms. The flags parameter of send(), recv(), sendto(), and recvfrom() allows for modification of some aspects of the behavior on a particular call. Some implementations support the MSG_DONTWAIT flag, which causes nonblocking behavior in any call where it is set in flags.

 

6.3.2 Asynchronous I/O

The difficulty with nonblocking socket calls is that there is no way of knowing when one would succeed, except by periodically trying it until it does (a process known as "polling").

Asynchronous I/O, it works by having the SIGIO signal delivered to the process when some I/O–related event occurs on the socket.

Arranging for SIGIO involves three steps. First, we inform the system of the desired disposition of the signal using sigaction(). Then we ensure that signals related to the socket will be delivered to this process (because multiple processes can have access to the same socket, there might be ambiguity about which should get it) by making it the owner of the socket, using fcntl(). Finally, we mark the socket as being primed for asynchronous I/O by setting a flag (FASUNC), again via fcntl().

 

6.3.3 Timeouts

Sometimes, however, we may actually need to know that some I/O event has not happened for a certain time period.

The standardmethod of implementing timeouts is to set an alarm before calling a blocking function.
unsigned int alarm(int secs)
alarm() starts a timer, which expires after the specified number of seconds (secs); alarm() returns the number of seconds remaining for any previously scheduled alarm (or 0 if no alarm was scheduled).When the timer expires, a SIGALRM signal is sent to the process, and the handler function for SIGALRM, if any, is executed.

To implement this, the new client installs a handler for SIGALRM, and just before calling recvfrom(), it sets an alarm for two seconds. At the end of that interval of time, the SIGALRM signal is delivered, and the handler is invoked. When the handler returns, the blocked recvfrom() returns −1 with errno equal to EINTR. The client then resends the echo request to the server.

 

6.4 Multitasking

Iterative servers work best for applications where each client requires a small, bounded amount of work by the server.

Using constructs like processes or threads, we can farm out responsibility for each client to an independently executing copy of the server.

 

6.4.1 Per-Client Processes

In UNIX, fork() attempts the creation of a new process, returning −1 on failure. On success, a new process is created that is identical to the calling process, except for its process ID and the return value it receives from fork().If the return from fork() is 0, the process knows that it is the child. To the parent, fork() returns the process ID of the new child process.

When a child process terminates, it does not automatically disappear. In UNIX parlance, the child becomes a zombie. Zombies consume system resources until they are “harvested” by their parent with a call to waitpid().

 

6.4.2 Per-Client Thread

There are a few disadvantages to using threads instead of processes:

1.If a child process goes awry, it is easy to monitor and kill it from the command line using its process identifier. Threads may not provide this capability on some platforms, so additional server functionality must be provided to monitor and kill individual threads.
2.If the operating system is oblivious to the notion of threads, it may give every process the same size time slice. In that case a threaded Web server handling hundreds of clients may get the same amount of CPU time as a game of solitaire.

 

6.4.3 Constrained Multitasking

Process and thread creation both incur overhead. In addition, the scheduling and context switching among many processes or threads creates extra work for a system. As the number of processes or threads increases, the operating system spends more and more time dealing with this overhead.

We can avoid this problem by limiting the number of processes created by the server, an approach we call constrained-multitasking servers.

 

6.5 Multiplexing

It is often the case that an application needs the ability to do I/O onmultiple channels simultaneously. For example, wemight want to provide echo service on several ports at once.

A call to accept() (or recv())onone socket may block, causing established connections to another socket to wait unnecessarily.
This problem can be solved using nonblocking sockets, but in that case the server ends up continuously polling the sockets, which is wasteful. We would like to let the server block until some socket is ready for I/O.

 

With the select() function, a program can specify a list of descriptors to check for pending I/O; select() suspends the program until one of the descriptors in the list becomes ready to perform I/O and returns an indication of which descriptors are ready. Then the program can proceed with I/O on that descriptor with the assurance that the operation will not block

int  select(int  maxDescPlus1,   fd_set  *readDescs,   fd_set  *writeDescs,   fd_set   *exceptionDescs, 

      struct timeval *timeout)

readDescs:  Descriptors in this list are checked for immediate input data availability; that is, a call to recv() (or recvfrom() for a datagram socket) would not block.
writeDescs:  Descriptors in this list are checked for the ability to immediately write data; that is, a call to  send() (or sendto() for a datagram socket) would not block.
exceptionDescs:  Descriptors  in  this  list  are  checked  for  pending  exceptions  or  errors.  An example of a pending exception for a TCP socket would be if the remote end of a TCP socket had closed while data were still in the channel; in such a case, the next read or write operation would fail and return econnreset.

 

To save space, each of these lists of descriptors is typically represented as a  bit vector. To include a descriptor in the list, we set the bit in the bit vector corresponding to the number of its descriptor to 1.

Programs should not (and need not) rely on knowledge of  this implementation strategy, however, because the system provides macros for manipulating  instances of the type fd_set:

void FD_ZERO(fd_set *descriptorVector)
void FD_CLR(int descriptor, fd_set *descriptorVector)
void FD_SET(int descriptor, fd_set *descriptorVector)
int FD_ISSET(int descriptor, fd_set *descriptorVector)

FD_ZERO empties the list of descriptors. FD_CLR() and FD_SET() remove and add descriptors to the list, respectively. Membership of a descriptor in a list is tested by FD_ISSET(), which returns nonzero if the given descriptor is in the list, and 0 otherwise.

 

The maximum number of descriptors that can be contained in a list is given by the system-defined constant fd_setsize. While this number can be quite large, most applications use very few descriptors.To make the implementation more efficient, the select() function allows us to pass a hint, which indicates the largest descriptor number that needs to be considered in any of the lists.

 

The last parameter (timeout) allows control over how long select() will wait for something to happen. The timeout is specified with a timeval data structure:

struct  timeval  {
  time_t tv_sec;      //  Seconds
  time_t tv_usec;     //  Microseconds
};

If the time specified in the timeval structure elapses before any of the specified descriptors becomes ready for I/O, select() returns the value 0. If timeout is NULL, select() has no timeout bound and waits until some descriptor becomes ready.  Setting both tv_sec and tv_usec to 0 causes select() to return immediately, enabling polling of I/O descriptors.

 

If no errors occur, select() returns the total number of descriptors prepared for I/O. To indicate the descriptors ready for I/O, select() changes the descriptor lists so that only the positions corresponding to ready descriptors are set. For example, if descriptors 0, 3, and 5 are set in the initial read descriptor list, the write and exception descriptor lists are NULL, and descriptors 0 and 5 have data available for reading, select() returns 2, and only positions 0 and 5 are set in the returned read descriptor list. An error in select() is indicated by a return value of −1.

 

select() is a powerful function. It can also be used to implement a timeout version of any of the blocking I/O functions (e.g., recv(), accept()) without using alarms.

 

6.6 Multiple Recipients

Instead of unicasting the message to every host on the network—which requires us not only to know the address of every host on the network, but also to call sendto() on the message once for each host—we would like to be able to call sendto() just once and have the network handle the duplication for us.

There are two types of network duplication service: broadcast and multicast. With broadcast, the program calls sendto() once, and the message is automatically delivered to all hosts on the local network. With multicast, the message is sent once and delivered to a specific (possibly empty) group of hosts throughout the Internet—namely, those that have indicated to the network that they should receive messages sent to that group.

We mentioned that there are some restrictions on these services. The first is that only UDP sockets can use broadcast and multicast services. The second is that broadcast only covers a local scope, typically a local area network. The third restriction is that multicast across  the  entire  Internet  is  presently  not  supported  by  most  Internet  service  providers. In spite of these restrictions, these services can often be useful. For example, it is often useful to use multicast within a site such as a campus network, or broadcast to local hosts.

 

6.6.1 Broadcast

UDP datagrams can be sent to all nodes on an attached local network by sending them to a special address. In IPv4 it is called the “limited broadcast address,” and it is the all-ones address (in dotted-quad notation, 255.255.255.255). In IPv6 it is called the “all-nodes address (link scope)” and has the value FF02::1. Routers do not forward packets addressed to either one of these addresses, so neither one will take a datagram beyond the local network to which the sender is connected.

Note also that a broadcast UDP datagram will actually be “heard” at a host only if some program on that host is listening for datagrams on the port to which the datagram is addressed.

There is one other difference between a broadcast sender and a regular sender: before sending to the broadcast address, the special socket option SO_BROADCAST must be set.In effect, this asks the system for “permission” to broadcast.

 

6.6.2 Multicast

A multicast address identifies a set of receivers who have “asked” the network to deliver messages sent to that address.

Note that unlike the broadcast sender, the multicast sender does not need to set the permission to multicast. On the other hand, the multicast sender may set the TTL (“time-to-live”) value for the transmitted datagrams.

The multicast network service duplicates and delivers the message only to a specific set of receivers. This set of receivers, called a multicast group, is identified by a particular multicast (or group) address. These receivers need some mechanism to notify the network of their interest in receiving data sent to a particular multicast address. Once notified, the network can begin forwarding the multicast messages to the receiver. This notification of the network, called “joining a group” is accomplished via a multicast request (signaling) message sent (transparently) by the underlying protocol implementation. To cause this to happen, the receiving program needs to invoke an address-family-specific multicast socket option. For IPv4 it is ip_add_membership; for IPv6 it is (surprisingly enough) ipv6_add_membership. This socket option takes a structure containing the address of the multicast “group” to be joined.
Alas, this structure is also different for the two versions:
struct ip_mreq {
  struct in_addr imr_multiaddr;  // Group address
   in_addr imr_interface;  // local interface to join on
};
The IPv6 version differs only in the type of addresses it contains:
struct ipv6_mreq {

  struct in6_addr ipv6mr_multiaddr; // IPv6 multicast address of group
  unsigned int ipv6mr_interface;   // local interface to join on
};

 

6.6.3 Broadcast vs. Multicast
The decision of using broadcast or multicast in an application depends on several issues, including the fraction of network hosts interested in receiving the data, and the knowledge of the communicating parties. Broadcast works well if a large percentage of the network hosts wish to receive the message; however, if few hosts need to receive the packet, broadcast “imposes on” all hosts in the network for the benefit of a few. Multicast is preferred because it limits the duplication of data to those that have expressed interest. The disadvantages of multicast are (1) it is presently not supported globally, and (2) the sender and receiver must agree on an IP multicast address in advance. Knowledge of an address is not required for broadcast. In some contexts (local), this makes broadcast a better mechanism for discovery than multicast. All hosts can receive broadcast by default, so it is simple to ask all hosts a question like “Where’s the printer?” On the other hand, for wide-area applications, multicast is the only choice.

 

 

posted on 2010-09-13 23:46  龍蝦  阅读(455)  评论(0编辑  收藏  举报