• 博客园logo
  • 会员
  • 周边
  • 新闻
  • 博问
  • 闪存
  • 众包
  • 赞助商
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录

thtrll

  • 博客园
  • 联系
  • 订阅
  • 管理

公告

View Post

HTTP协议梳理:part2-http第一章简介查看

1 Introduction

1.1 Purpose

   The Hypertext Transfer Protocol (HTTP) is an application-level
   protocol for distributed, collaborative, hypermedia information
   systems. HTTP has been in use by the World-Wide Web global
   information initiative since 1990. The first version of HTTP,
   referred to as HTTP/0.9, was a simple protocol for raw data transfer
   across the Internet. HTTP/1.0, as defined by RFC 1945 [6], improved
   the protocol by allowing messages to be in the format of MIME-like
   messages, containing metainformation about the data transferred and
   modifiers on the request/response semantics. However, HTTP/1.0 does
   not sufficiently take into consideration the effects of hierarchical
   proxies, caching, the need for persistent connections, or virtual
   hosts. In addition, the proliferation of incompletely-implemented
   applications calling themselves "HTTP/1.0" has necessitated a
   protocol version change in order for two communicating applications
   to determine each other's true capabilities.

   This specification defines the protocol referred to as "HTTP/1.1".
   This protocol includes more stringent requirements than HTTP/1.0 in
   order to ensure reliable implementation of its features.

   Practical information systems require more functionality than simple
   retrieval, including search, front-end update, and annotation. HTTP
   allows an open-ended set of methods and headers that indicate the
   purpose of a request [47]. It builds on the discipline of reference
   provided by the Uniform Resource Identifier (URI) [3], as a location
   (URL) [4] or name (URN) [20], for indicating the resource to which a

  method is to be applied. Messages are passed in a format similar to
   that used by Internet mail [9] as defined by the Multipurpose
   Internet Mail Extensions (MIME) [7].

   HTTP is also used as a generic protocol for communication between
   user agents and proxies/gateways to other Internet systems, including
   those supported by the SMTP [16], NNTP [13], FTP [18], Gopher [2],
   and WAIS [10] protocols. In this way, HTTP allows basic hypermedia
   access to resources available from diverse applications.

1.2 Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [34].

   An implementation is not compliant if it fails to satisfy one or more
   of the MUST or REQUIRED level requirements for the protocols it
   implements. An implementation that satisfies all the MUST or REQUIRED
   level and all the SHOULD level requirements for its protocols is said
   to be "unconditionally compliant"; one that satisfies all the MUST
   level requirements but not all the SHOULD level requirements for its
   protocols is said to be "conditionally compliant."

1.3 Terminology

   This specification uses a number of terms to refer to the roles
   played by participants in, and objects of, the HTTP communication.

   connection
      A transport layer virtual circuit established between two programs
      for the purpose of communication.

   message
      The basic unit of HTTP communication, consisting of a structured
      sequence of octets matching the syntax defined in section 4 and
      transmitted via the connection.

   request
      An HTTP request message, as defined in section 5.

   response
      An HTTP response message, as defined in section 6.
 
resource
      A network data object or service that can be identified by a URI,
      as defined in section 3.2. Resources may be available in multiple
      representations (e.g. multiple languages, data formats, size, and
      resolutions) or vary in other ways.

   entity
      The information transferred as the payload of a request or
      response. An entity consists of metainformation in the form of
      entity-header fields and content in the form of an entity-body, as
      described in section 7.

   representation
      An entity included with a response that is subject to content
      negotiation, as described in section 12. There may exist multiple
      representations associated with a particular response status.

   content negotiation
      The mechanism for selecting the appropriate representation when
      servicing a request, as described in section 12. The
      representation of entities in any response can be negotiated
      (including error responses).

   variant
      A resource may have one, or more than one, representation(s)
      associated with it at any given instant. Each of these
      representations is termed a `varriant'.  Use of the term `variant'
      does not necessarily imply that the resource is subject to content
      negotiation.

   client
      A program that establishes connections for the purpose of sending
      requests.

   user agent
      The client which initiates a request. These are often browsers,
      editors, spiders (web-traversing robots), or other end user tools.

   server
      An application program that accepts connections in order to
      service requests by sending back responses. Any given program may
      be capable of being both a client and a server; our use of these
      terms refers only to the role being performed by the program for a
      particular connection, rather than to the program's capabilities
      in general. Likewise, any server may act as an origin server,
      proxy, gateway, or tunnel, switching behavior based on the nature
      of each request.
origin server
      The server on which a given resource resides or is to be created.

   proxy
      An intermediary program which acts as both a server and a client
      for the purpose of making requests on behalf of other clients.
      Requests are serviced internally or by passing them on, with
      possible translation, to other servers. A proxy MUST implement
      both the client and server requirements of this specification. A
      "transparent proxy" is a proxy that does not modify the request or
      response beyond what is required for proxy authentication and
      identification. A "non-transparent proxy" is a proxy that modifies
      the request or response in order to provide some added service to
      the user agent, such as group annotation services, media type
      transformation, protocol reduction, or anonymity filtering. Except
      where either transparent or non-transparent behavior is explicitly
      stated, the HTTP proxy requirements apply to both types of
      proxies.

   gateway
      A server which acts as an intermediary for some other server.
      Unlike a proxy, a gateway receives requests as if it were the
      origin server for the requested resource; the requesting client
      may not be aware that it is communicating with a gateway.

   tunnel
      An intermediary program which is acting as a blind relay between
      two connections. Once active, a tunnel is not considered a party
      to the HTTP communication, though the tunnel may have been
      initiated by an HTTP request. The tunnel ceases to exist when both
      ends of the relayed connections are closed.

   cache
      A program's local store of response messages and the subsystem
      that controls its message storage, retrieval, and deletion. A
      cache stores cacheable responses in order to reduce the response
      time and network bandwidth consumption on future, equivalent
      requests. Any client or server may include a cache, though a cache
      cannot be used by a server that is acting as a tunnel.

   cacheable
      A response is cacheable if a cache is allowed to store a copy of
      the response message for use in answering subsequent requests. The
      rules for determining the cacheability of HTTP responses are
      defined in section 13. Even if a resource is cacheable, there may
      be additional constraints on whether a cache can use the cached
      copy for a particular request.
first-hand
      A response is first-hand if it comes directly and without
      unnecessary delay from the origin server, perhaps via one or more
      proxies. A response is also first-hand if its validity has just
      been checked directly with the origin server.

   explicit expiration time
      The time at which the origin server intends that an entity should
      no longer be returned by a cache without further validation.

   heuristic expiration time
      An expiration time assigned by a cache when no explicit expiration
      time is available.

   age
      The age of a response is the time since it was sent by, or
      successfully validated with, the origin server.

   freshness lifetime
      The length of time between the generation of a response and its
      expiration time.

   fresh
      A response is fresh if its age has not yet exceeded its freshness
      lifetime.

   stale
      A response is stale if its age has passed its freshness lifetime.

   semantically transparent
      A cache behaves in a "semantically transparent" manner, with
      respect to a particular response, when its use affects neither the
      requesting client nor the origin server, except to improve
      performance. When a cache is semantically transparent, the client
      receives exactly the same response (except for hop-by-hop headers)
      that it would have received had its request been handled directly
      by the origin server.

   validator
      A protocol element (e.g., an entity tag or a Last-Modified time)
      that is used to find out whether a cache entry is an equivalent
      copy of an entity.

   upstream/downstream
      Upstream and downstream describe the flow of a message: all
      messages flow from upstream to downstream.
   inbound/outbound
      Inbound and outbound refer to the request and response paths for
      messages: "inbound" means "traveling toward the origin server",
      and "outbound" means "traveling toward the user agent"

1.4 Overall Operation

   The HTTP protocol is a request/response protocol. A client sends a
   request to the server in the form of a request method, URI, and
   protocol version, followed by a MIME-like message containing request
   modifiers, client information, and possible body content over a
   connection with a server. The server responds with a status line,
   including the message's protocol version and a success or error code,
   followed by a MIME-like message containing server information, entity
   metainformation, and possible entity-body content. The relationship
   between HTTP and MIME is described in appendix 19.4.

   Most HTTP communication is initiated by a user agent and consists of
   a request to be applied to a resource on some origin server. In the
   simplest case, this may be accomplished via a single connection (v)
   between the user agent (UA) and the origin server (O).

          request chain ------------------------>
       UA -------------------v------------------- O
          <----------------------- response chain

   A more complicated situation occurs when one or more intermediaries
   are present in the request/response chain. There are three common
   forms of intermediary: proxy, gateway, and tunnel. A proxy is a
   forwarding agent, receiving requests for a URI in its absolute form,
   rewriting all or part of the message, and forwarding the reformatted
   request toward the server identified by the URI. A gateway is a
   receiving agent, acting as a layer above some other server(s) and, if
   necessary, translating the requests to the underlying server's
   protocol. A tunnel acts as a relay point between two connections
   without changing the messages; tunnels are used when the
   communication needs to pass through an intermediary (such as a
   firewall) even when the intermediary cannot understand the contents
   of the messages.

          request chain -------------------------------------->
       UA -----v----- A -----v----- B -----v----- C -----v----- O
          <------------------------------------- response chain

   The figure above shows three intermediaries (A, B, and C) between the
   user agent and origin server. A request or response message that
   travels the whole chain will pass through four separate connections.
   This distinction is important because some HTTP communication options
 
   may apply only to the connection with the nearest, non-tunnel
   neighbor, only to the end-points of the chain, or to all connections
   along the chain. Although the diagram is linear, each participant may
   be engaged in multiple, simultaneous communications. For example, B
   may be receiving requests from many clients other than A, and/or
   forwarding requests to servers other than C, at the same time that it
   is handling A's request.

   Any party to the communication which is not acting as a tunnel may
   employ an internal cache for handling requests. The effect of a cache
   is that the request/response chain is shortened if one of the
   participants along the chain has a cached response applicable to that
   request. The following illustrates the resulting chain if B has a
   cached copy of an earlier response from O (via C) for a request which
   has not been cached by UA or A.

          request chain ---------->
       UA -----v----- A -----v----- B - - - - - - C - - - - - - O
          <--------- response chain

   Not all responses are usefully cacheable, and some requests may
   contain modifiers which place special requirements on cache behavior.
   HTTP requirements for cache behavior and cacheable responses are
   defined in section 13.

   In fact, there are a wide variety of architectures and configurations
   of caches and proxies currently being experimented with or deployed
   across the World Wide Web. These systems include national hierarchies
   of proxy caches to save transoceanic bandwidth, systems that
   broadcast or multicast cache entries, organizations that distribute
   subsets of cached data via CD-ROM, and so on. HTTP systems are used
   in corporate intranets over high-bandwidth links, and for access via
   PDAs with low-power radio links and intermittent connectivity. The
   goal of HTTP/1.1 is to support the wide diversity of configurations
   already deployed while introducing protocol constructs that meet the
   needs of those who build web applications that require high
   reliability and, failing that, at least reliable indications of
   failure.

   HTTP communication usually takes place over TCP/IP connections. The
   default port is TCP 80 [19], but other ports can be used. This does
   not preclude HTTP from being implemented on top of any other protocol
   on the Internet, or on other networks. HTTP only presumes a reliable
   transport; any protocol that provides such guarantees can be used;
   the mapping of the HTTP/1.1 request and response structures onto the
   transport data units of the protocol in question is outside the scope
   of this specification.
   In HTTP/1.0, most implementations used a new connection for each
   request/response exchange. In HTTP/1.1, a connection may be used for
   one or more request/response exchanges, although connections may be
   closed for a variety of reasons (see section 8.1).

目标:通过阅读简介理解http的设计的意义,设计的目的,能够完成哪些功能?

简单翻译来自chatgpt

这段文本是关于HTTP协议的介绍,旨在说明HTTP的历史、用途、要求、术语和整体操作方式。以下是翻译和总结:

1.1 目的

  • HTTP是分布式、协作的超媒体信息系统的应用级协议。
  • HTTP自1990年以来一直在全球信息倡议的World-Wide Web中使用。
  • HTTP/0.9是最初版本,用于在互联网上传输原始数据。
  • HTTP/1.0通过允许消息采用类似MIME的格式来改进协议,包含有关传输数据的元信息和请求/响应语义的修改。
  • 但HTTP/1.0没有充分考虑分层代理、缓存、持久连接或虚拟主机的影响,导致了协议版本更改的必要性。

1.2 要求

  • 文档中使用的关键词如"必须"、"不得"、"要求"等需要按照RFC 2119中的定义进行解释。
  • 不符合协议实现的MUST或REQUIRED级别要求的实现将被视为非符合标准。
  • 符合协议的MUST和REQUIRED级别要求以及SHOULD级别要求的实现被称为"无条件符合";如果符合MUST级别要求但不符合SHOULD级别要求的实现被称为"有条件符合"。

1.3 术语

  • 介绍了一系列HTTP通信中涉及的术语,如连接、消息、请求、响应、资源、实体、表示、内容协商等。

1.4 整体操作

  • HTTP是请求/响应协议,客户端发送请求,服务器返回响应。
  • 大多数HTTP通信由用户代理启动,通常包括请求方法、URI和协议版本,以及可能包含请求修饰符、客户端信息和请求主体内容的MIME样式消息。
  • 服务器响应包括状态行、协议版本、成功或错误代码,以及可能包含服务器信息、实体元信息和实体主体内容的MIME样式消息。
  • 介绍了HTTP与MIME之间的关系。
  • 详细说明了代理、网关和隧道等中间件的不同作用。
  • 提到了缓存对请求/响应链的影响,以及不同类型的缓存和代理。
  • 强调HTTP通常基于TCP/IP连接进行,但可以实现在其他协议之上,只要提供可靠的传输保证。

总之,这段文本介绍了HTTP协议的起源、目的、要求和运作方式,以及在不同网络环境中的应用。它还提到了HTTP/1.1相对于HTTP/1.0的改进和扩展。

 

MIME样式简介

"MIME样式消息" 指的是采用多用途互联网邮件扩展(MIME)标准定义的消息格式。MIME是一种多媒体数据的标准表示方法,最初用于电子邮件系统,后来被广泛应用于HTTP通信和Web中。

MIME样式消息包括以下主要部分:

  1. 头部字段(Headers): 用于包含元信息,描述消息的属性,如内容类型、字符集、编码方式、日期等。
  2. 实体主体(Entity Body): 包含消息的实际内容,可以是文本、图像、音频、视频或其他数据类型。实体主体的格式和编码方式可以根据内容类型进行指定。

这种消息格式的灵活性使得它适用于传输各种类型的数据,从简单的纯文本消息到复杂的多媒体内容,都可以通过MIME样式消息进行表示和传输。在HTTP通信中,HTTP消息的格式也采用了MIME样式消息的结构,因此HTTP消息头部包含了一系列的MIME头字段,用于指定内容类型和其他相关信息,而实体主体部分包含实际的数据内容。

总之,MIME样式消息是一种通用的消息表示格式,用于在HTTP、电子邮件和其他互联网通信协议中传输各种类型的数据。

 

1.3术语说明

本规范使用许多术语来指代HTTP通信中参与者和对象的角色。

  • 连接

    • 用于通信目的的两个程序之间建立的传输层虚拟电路。
  • 消息

    • HTTP通信的基本单位,由与第4节中定义的语法匹配的八位字节的结构化序列组成,并通过连接传输。
  • 请求

    • HTTP请求消息,如第5节中所定义。
  • 响应

    • HTTP响应消息,如第6节中所定义。
  • 资源

    • 可以通过URI标识的网络数据对象或服务,如第3.2节中定义。资源可以具有多个表示(例如多种语言、数据格式、大小和分辨率)或以其他方式变化。
  • 实体

    • 作为请求或响应的有效负载传输的信息。实体由实体头字段形式的元信息和实体主体形式的内容组成,如第7节中所描述。
  • 表示

    • 作为响应的一部分包含在其中的实体,受内容协商的影响,如第12节所述。可能存在与特定响应状态相关联的多个表示。
  • 内容协商

    • 在处理请求时选择适当表示的机制,如第12节所述。可以协商任何响应中实体的表示(包括错误响应)。
  • 变体

    • 资源可能在任何给定时刻与之关联一个或多个表示。每个这些表示都称为“变体”。使用术语“变体”并不一定意味着资源受到内容协商的影响。
  • 客户端

    • 为发送请求而建立连接的程序。
  • 用户代理

    • 发起请求的客户端,通常是浏览器、编辑器、网络爬虫(网页遍历机器人)或其他终端用户工具。
  • 服务器

    • 接受连接以通过发送响应来服务请求的应用程序。同一个程序可能既可以是客户端又可以是服务器;我们对这些术语的使用仅仅是指程序在特定连接上执行的角色,而不是程序的总体能力。同样,任何服务器都可以充当原始服务器、代理、网关或隧道,根据每个请求的性质切换行为。
  • 原始服务器

    • 存放或将要创建给定资源的服务器。
  • 代理

    • 用于代表其他客户端发出请求的中间程序,同时充当服务器和客户端。请求可以在内部处理,也可以通过将其传递给其他服务器来进行处理,可能需要进行翻译。代理必须实现本规范的客户端和服务器要求。 "透明代理"是一个不修改请求或响应的代理,除了必要的代理身份验证和标识外。 "非透明代理"是一个修改请求或响应以为用户代理提供一些附加服务的代理,例如组注释服务、媒体类型转换、协议减少或匿名过滤。除非明确说明了是透明或非透明行为,HTTP代理要求适用于两种类型的代理。
  • 网关

    • 用作其他服务器的中间代理的服务器。与代理不同,网关接收请求,就像它是所请求资源的原始服务器一样;请求的客户端可能不知道它正在与网关通信。
  • 隧道

    • 作为两个连接之间的盲中继工作的中间程序。一旦激活,隧道不被视为HTTP通信的一方,尽管隧道可能是由HTTP请求发起的。当传输连接的两端都关闭时,隧道将不复存在。
  • 缓存

    • 一个程序的本地响应消息存储和控制其消息存储、检索和删除的子系统。缓存存储可缓存的响应,以减少将来等效请求的响应时间和网络带宽消耗。任何客户端或服务器都可以包括缓存,尽管充当隧道的服务器不能使用缓存。
  • 可缓存

    • 如果允许缓存存储响应消息的副本以用于回答后续请求,则响应是可缓存的。确定HTTP响应是否可缓存的规则在第13节中定义。即使资源是可缓存的,也可能存在关于缓存是否可以在特定请求中使用缓存副本的其他约束。
  • 直接过期时间

    • 原始服务器打算在没有进一步验证的情况下不再返回实体的时间。
  • 启发式过期时间

    • 当没有明确的过期时间可用时,缓存分配的过期时间。
  • 年龄

    • 响应的年龄是自从它由原始服务器发送或成功验证以来的时间。
  • 新鲜度寿命

    • 响应生成和到期时间之间的时间长度。
  • 新鲜

    • 如果响应的年龄尚未超过其新鲜度寿命,则响应是新鲜的。
  • 陈旧

    • 如果响应的年龄已经超过其新鲜度寿命,则响应是陈旧的。
  • 语义透明

    • 缓存在处理特定响应时以"语义透明"的方式行为,不会影响请求客户端或原始服务器,除非为了提高性能。当缓存是语义透明时,客户端会接收到与直接由

 

 1.4 操作说明

HTTP协议是一种请求/响应协议。客户端通过与服务器建立连接,以请求方法、URI和协议版本的形式发送请求,随后是一个类似MIME的消息,包含请求修饰符、客户端信息和可能的主体内容。服务器会以一个状态行响应,其中包括消息的协议版本和成功或错误代码,然后是一个类似MIME的消息,包含服务器信息、实体元信息和可能的实体主体内容。HTTP与MIME之间的关系在附录19.4中有描述。

大多数HTTP通信由用户代理(user agent,UA)发起,包括要应用于某个原始服务器上的资源的请求。在最简单的情况下,这可以通过用户代理(UA)和原始服务器(O)之间的单个连接(v)来实现。

请求链 ------------------------> UA -------------------v------------------- O <----------------------- 响应链

在请求/响应链中存在一个或多个中间件时,情况会更加复杂。有三种常见的中间件形式:代理、网关和隧道。代理是一个转发代理,接收以其绝对形式的URI的请求,重新编写消息的全部或部分,并将重新格式化的请求转发到由URI标识的服务器。网关是一个接收代理,作为某些其他服务器的上层操作,并且如果必要,将请求转换为底层服务器的协议。隧道充当两个连接之间的中继点,而不更改消息;当通信需要通过中介(例如防火墙)时,即使中介不能理解消息的内容,也会使用隧道。

请求链 --------------------------------------> UA -----v----- A -----v----- B -----v----- C -----v----- O <------------------------------------- 响应链

上图显示了用户代理和原始服务器之间的三个中间人(A、B和C)。整个链路上的请求或响应消息将通过四个单独的连接传递。这一区别很重要,因为一些HTTP通信选项可能仅适用于与最近的非隧道邻居的连接,仅适用于链的端点,或适用于链中的所有连接。虽然图表是线性的,但每个参与者可以参与多个同时通信。例如,B除了处理A的请求外,还可以从许多其他客户端接收请求,并/或将请求转发到C之外的服务器。

任何不充当隧道的通信参与方都可以使用内部缓存来处理请求。缓存的效果是,如果沿着链的某个参与方具有适用于该请求的缓存响应,那么请求/响应链就会缩短。下面说明了如果B具有从O(通过C)的早期响应的缓存副本,并且该请求未被UA或A缓存,则产生的链路。

请求链 ----------> UA -----v----- A -----v----- B - - - - - - C - - - - - - O <--------- 响应链

并非所有响应都适用于缓存,而且一些请求可能包含修改器,对缓存行为有特殊要求。HTTP关于缓存行为和可缓存响应的要求在第13节中定义。

实际上,目前正在实验或部署的缓存和代理的各种体系结构和配置种类多种多样,包括国家级代理缓存层次结构以节省跨洋带宽、广播或多播缓存条目的系统、通过CD-ROM分发缓存数据子集的组织等。HTTP系统用于在高带宽链路上的企业内部网络中,并通过低功率无线电链路和间歇性连接访问PDA。HTTP/1.1的目标是支持已经部署的各种多样性配置,同时引入满足需要高可靠性的构建Web应用程序的人的协议构造,即使失败也至少提供可靠的故障指示。

HTTP通信通常在TCP/IP连接上进行。默认端口是TCP 80 [19],但也可以使用其他端口。这并不排除HTTP在互联网上的任何其他协议或其他网络之上实现。HTTP仅假定可靠的传输;任何提供此类保证的协议都可以使用;HTTP/1.1请求和响应结构的映射到所讨论的协议的传输数据单元是超出本规范范围的。在HTTP/1.0中,大多数实现每次请求/响应交换都使用新的连接。在HTTP/1.1中,可以使用一个连接进行一个或多个请求/响应交换,尽管出于各种原因可以关闭连接(参见第8.1节)。

 我的理解:

 1.http的整体格式使用MIME格式,一个现有的格式, header和实体

2.http有相应的等级划分,对是否符合http协议进行划分

3.使用到的术语介绍

4.连接的过程说明(重点)

http的请求必然有UA 用户代理,中间经过的链条可以简单也可以复杂,

最简单-->通过用户代理(UA)和原始服务器(O)之间的单个连接(v)来实现

中间件存在时,情况会复杂许多

常见的中间件形式:代理、网关和隧道

其中有些情况会使用到http的缓存来缩短请求链路,这会对设计http的请求回路造成便利,对缓存也提出相应要求。

 

 

 

 

 

 

 

posted on 2023-09-21 22:30  thtrll  阅读(18)  评论(0)    收藏  举报

刷新页面返回顶部
 
博客园  ©  2004-2026
浙公网安备 33010602011771号 浙ICP备2021040463号-3