nginx 中的read 和recv 函数
前言
在 Nginx 的高性能事件驱动架构中,I/O 处理是性能的核心环节。
其中,read 与 recv 是最基础的两个系统调用接口,用于从套接字中读取数据。
但在 Nginx 中,它们并非简单地直接调用内核函数,而是经过一层抽象和事件机制封装,形成一套高效、可扩展的 I/O 模型。
本文将从 Linux 系统调用开始,层层剖析 Nginx 的读取路径,从底层 recv() 到封装的 ngx_unix_recv(),再到事件触发机制的交互,帮你彻底理解 Nginx 的网络 I/O 核心。
Linux 中的 read 和 recv
在 Linux 下,read() 与 recv() 都可以从套接字读取数据。
read() 是通用文件读取接口;
recv() 专门为 socket 通信设计,提供更多选项(如 MSG_PEEK、MSG_DONTWAIT)。
二者调用路径几乎相同,区别在于语义:
| 函数 | 用途 | 特点 |
|---|---|---|
read(fd, buf, size) |
通用文件读取 | 阻塞直到有数据 |
recv(fd, buf, size, flags) |
套接字读取 | 支持非阻塞与更多控制选项 |
recvfrom() |
支持 UDP 地址信息 | |
recvmsg() |
支持多缓冲区和控制消息 |
在 Nginx 的实现中,为了兼容各种系统平台与 I/O 模式,内部统一通过一个封装层调用 recv()和 send()。
man手册中对于recv的描述
RETURN VALUE top
These calls return the number of bytes received, or -1 if an error
occurred. In the event of an error, errno is set to indicate the
error.
When a stream socket peer has performed an orderly shutdown, the
return value will be 0 (the traditional "end-of-file" return).
Datagram sockets in various domains (e.g., the UNIX and Internet
domains) permit zero-size datagrams. When such a datagram is
received, the return value is 0.
The value 0 may also be returned if the requested number of bytes
to receive from a stream socket was 0.
ERRORS top
These are some standard errors generated by the socket layer.
Additional errors may be generated and returned from the
underlying protocol modules; see their manual pages.
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and the receive operation
would block, or a receive timeout had been set and the
timeout expired before data was received. POSIX.1 allows
either error to be returned for this case, and does not
require these constants to have the same value, so a
portable application should check for both possibilities.
EBADF The argument sockfd is an invalid file descriptor.
ECONNREFUSED
A remote host refused to allow the network connection
(typically because it is not running the requested
service).
EFAULT The receive buffer pointer(s) point outside the process's
address space.
EINTR The receive was interrupted by delivery of a signal before
any data was available; see signal(7).
EINVAL Invalid argument passed.
ENOMEM Could not allocate memory for recvmsg().
ENOTCONN
The socket is associated with a connection-oriented
protocol and has not been connected (see connect(2) and
accept(2)).
ENOTSOCK
The file descriptor sockfd does not refer to a socket.
man 手册对于 send 的描述
RETURN VALUE top
On success, these calls return the number of bytes sent. On
error, -1 is returned, and errno is set to indicate the error.
ERRORS top
These are some standard errors generated by the socket layer.
Additional errors may be generated and returned from the
underlying protocol modules; see their respective manual pages.
EACCES (For UNIX domain sockets, which are identified by pathname)
Write permission is denied on the destination socket file,
or search permission is denied for one of the directories
the path prefix. (See path_resolution(7).)
(For UDP sockets) An attempt was made to send to a
network/broadcast address as though it was a unicast
address.
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and the requested
operation would block. POSIX.1-2001 allows either error to
be returned for this case, and does not require these
constants to have the same value, so a portable application
should check for both possibilities.
EAGAIN (Internet domain datagram sockets) The socket referred to
by sockfd had not previously been bound to an address and,
upon attempting to bind it to an ephemeral port, it was
determined that all port numbers in the ephemeral port
range are currently in use. See the discussion of
/proc/sys/net/ipv4/ip_local_port_range in ip(7).
EALREADY
Another Fast Open is in progress.
EBADF sockfd is not a valid open file descriptor.
ECONNRESET
Connection reset by peer.
EDESTADDRREQ
The socket is not connection-mode, and no peer address is
set.
EFAULT An invalid user space address was specified for an
argument.
EINTR A signal occurred before any data was transmitted; see
signal(7).
EINVAL Invalid argument passed.
EISCONN
The connection-mode socket was connected already but a
recipient was specified. (Now either this error is
returned, or the recipient specification is ignored.)
EMSGSIZE
The socket type requires that message be sent atomically,
and the size of the message to be sent made this
impossible.
ENOBUFS
The output queue for a network interface was full. This
generally indicates that the interface has stopped sending,
but may be caused by transient congestion. (Normally, this
does not occur in Linux. Packets are just silently dropped
when a device queue overflows.)
ENOMEM No memory available.
ENOTCONN
The socket is not connected, and no target has been given.
ENOTSOCK
The file descriptor sockfd does not refer to a socket.
EOPNOTSUPP
Some bit in the flags argument is inappropriate for the
socket type.
EPIPE The local end has been shut down on a connection oriented
socket. In this case, the process will also receive a
SIGPIPE unless MSG_NOSIGNAL is set.
总结
recv() 和 send() 都是 TCP/UDP 套接字的基础 I/O 调用:它们分别用于接收与发送数据,成功时返回传输的字节数,失败时返回 -1 并设置 errno 指示错误原因。
在流式套接字(TCP)中,recv() 返回 0 表示对端执行了有序关闭(EOF);send() 返回 -1 且 errno=EPIPE 则表示本端向已关闭连接发送数据。若套接字为非阻塞模式,recv() 或 send() 在数据暂不可用时会返回 -1 并设置 errno=EAGAIN 或 EWOULDBLOCK,提示调用者稍后重试。除此之外,常见错误还包括 EINTR(信号中断)、ENOTCONN(未建立连接)和 ECONNRESET(连接被对端重置)。
一句话概括:
recv()读不到数据或连接关闭返回 0 或 EAGAIN,send()发不出去或连接断开返回 -1 并置 errno;两者的核心都是“成功返回字节数,失败返回 -1 并报告错误”。
Nginx中 Recv 的封装
首先对于 nginx 中的文件描述符来说,他们都是非阻塞的
源码位置:src/os/unix/ngx_recv.c
ssize_t ngx_unix_recv(ngx_connection_t* c, u_char* buf, size_t size) {
ssize_t n;
ngx_err_t err;
ngx_event_t* rev;
rev = c->read;
do {
n = recv(c->fd, buf, size, 0);
ngx_log_debug3(NGX_LOG_DEBUG_EVENT, c->log, 0, "recv: fd:%d %z of %uz", c->fd, n, size);
if (n == 0) {
rev->ready = 0;
rev->eof = 1;
return 0;
}
if (n > 0) {
if ((size_t) n < size && !(ngx_event_flags & NGX_USE_GREEDY_EVENT)) {
rev->ready = 0;
}
return n;
}
err = ngx_socket_errno;
if (err == NGX_EAGAIN || err == NGX_EINTR) {
ngx_log_debug0(NGX_LOG_DEBUG_EVENT, c->log, err, "recv() not ready");
n = NGX_AGAIN;
} else {
n = ngx_connection_error(c, err, "recv() failed");
break;
}
} while (err == NGX_EINTR);
rev->ready = 0;
if (n == NGX_ERROR) {
rev->error = 1;
}
return n;
}
咱们只看 n < 0 的情况:
🔸 情况一:EAGAIN / EWOULDBLOCK
表示当前没有数据可读(非阻塞模式下很常见)。
- 设置
n = NGX_AGAIN - 表示“暂时没有数据”,上层不会关闭连接,而是重新注册事件等待。
🔸 情况二:EINTR
表示被信号中断(如接收到 SIGCHLD)。
Nginx 会在 do...while (err == NGX_EINTR) 中 重试一次
🔸 情况三:其他错误
如:
ECONNRESET(连接被重置)EBADF(无效的文件描述符)EPIPE(写入已关闭连接)
行为总结
| 场景 | 返回值 | rev->ready | rev->eof | rev->error | 含义 |
|---|---|---|---|---|---|
| 成功读到 n 字节 | n > 0 | 可能置 0 | 0 | 0 | 读到数据 |
| 客户端关闭连接 | 0 | 0 | 1 | 0 | EOF |
| 没数据(EAGAIN) | NGX_AGAIN | 0 | 0 | 0 | 稍后再读 |
| 信号中断(EINTR) | 重试 | - | - | - | 重试 |
| 其他错误 | NGX_ERROR | 0 | 0 | 1 | 连接错误 |
Nginx中 Send 的封装
源码位置:src/os/unix/ngx_send.c
ssize_t ngx_unix_send(ngx_connection_t* c, u_char* buf, size_t size) {
ssize_t n;
ngx_err_t err;
ngx_event_t* wev;
wev = c->write;
for (;;) {
n = send(c->fd, buf, size, 0);
ngx_log_debug3(NGX_LOG_DEBUG_EVENT, c->log, 0, "send: fd:%d %z of %uz", c->fd, n, size);
if (n > 0) {
if (n < (ssize_t) size) {
wev->ready = 0;
}
c->sent += n;
return n;
}
err = ngx_socket_errno;
if (n == 0) {
ngx_log_error(NGX_LOG_ALERT, c->log, err, "send() returned zero");
wev->ready = 0;
return n;
}
if (err == NGX_EAGAIN || err == NGX_EINTR) {
wev->ready = 0;
ngx_log_debug0(NGX_LOG_DEBUG_EVENT, c->log, err, "send() not ready");
if (err == NGX_EAGAIN) {
return NGX_AGAIN;
}
} else {
wev->error = 1;
(void) ngx_connection_error(c, err, "send() failed");
return NGX_ERROR;
}
}
}
| 返回值 | 含义 | 典型原因/场景 | Nginx 中的处理 |
|---|---|---|---|
| > 0 | 成功发送的字节数 | 发送成功,可能部分发送 | 更新 c->sent;若未全部发送 (n < len),标记 wev->ready = 0 |
| = 0 | 一般表示对端关闭或异常 | 对端关闭连接,或底层缓冲异常 | 记录日志 "send() returned zero",设置 wev->ready = 0 |
| = -1, errno = EAGAIN | 非阻塞 socket 当前不可写 | 内核发送缓冲区已满 | 返回 NGX_AGAIN,等待 epoll 可写事件 |
| = -1, errno = EINTR | 被信号中断 | 信号中断系统调用 | 重试发送(继续 for 循环) |
| = -1, 其他 errno | 真正的发送错误 | 连接中断、网络错误等 | 标记 wev->error = 1,调用 ngx_connection_error(),返回 NGX_ERROR |
总结
当 recv() 或 send() 返回 EAGAIN 时,表示当前套接字暂时不可读或不可写,这在非阻塞模式下是正常情况。
Nginx 的处理逻辑是:
-
标记事件未就绪(rev->ready = 0 或 wev->ready = 0),表示当前连接暂时不能继续读/写。
-
不立即重试系统调用,而是返回 NGX_AGAIN,让上层逻辑(如 HTTP 模块或事件循环)知道需要等待下一次事件触发。
-
事件模块(如 epoll)会继续监听该 fd 的可读/可写事件;当内核通知其可读/可写时,Nginx 再次调用 recv() 或 send() 尝试继续操作。
一句话总结:
EAGAIN 不是错误,而是“暂时没准备好”;Nginx 会挂起该连接,等事件就绪后自动重试,从而实现非阻塞的高并发 I/O。
浙公网安备 33010602011771号