互联网协议入门
原文 http://www.ruanyifeng.com/blog/2012/06/internet_protocol_suite_part_ii.html pdf
从网络上看到很好的理解文章,强烈推荐,作者阮一峰,摘自他的个人网址吧。。。自己做了小小整理和小小修改。
互联网协议入门
作者: 阮一峰
我们每天使用互联网,你是否想过,它是如何实现的?
全世界几十亿台电脑,连接在一起,两两通信。上海的某一块网卡送出信号,洛杉矶的另一块网卡居然就收到了,两者实际上根本不知道对方的物理位置,你不觉得这是很神奇的事情吗?
互联网的核心是一系列协议,总称为"互联网协议"(Internet Protocol Suite)。它们对电脑如何连接和组网,做出了详尽的规定。理解了这些协议,就理解了互联网的原理。
下面就是我的学习笔记。因为这些协议实在太复杂、太庞大,我想整理一个简洁的框架,帮助自己从总体上把握它们。为了保证简单易懂,我做了大量的简化,有些地方并不全面和精确,但是应该能够说清楚互联网的原理。
=================================================
互联网协议入门(一)
作者:阮一峰
一、概述
1.1 五层模型
互联网的实现,分成好几层。每一层都有自己的功能,就像建筑物一样,每一层都靠下一层支持。
用户接触到的,只是最上面的一层,根本没有感觉到下面的层。要理解互联网,必须从最下层开始,自下而上理解每一层的功能。
如何分层有不同的模型,有的模型分七层,有的分四层。我觉得,把互联网分成五层,比较容易解释。
如上图所示,最底下的一层叫做"实体层"(Physical Layer),最上面的一层叫做"应用层"(Application Layer),中间的三层(自下而上)分别是"链接层"(Link Layer)、"网络层"(Network Layer)和"传输层"(Transport Layer)。越下面的层,越靠近硬件;越上面的层,越靠近用户。
它们叫什么名字,其实并不重要。只需要知道,互联网分成若干层就可以了。
1.2 层与协议
每一层都是为了完成一种功能。为了实现这些功能,就需要大家都遵守共同的规则。
大家都遵守的规则,就叫做"协议"(protocol)。
互联网的每一层,都定义了很多协议。这些协议的总称,就叫做"互联网协议"(Internet Protocol Suite)。它们是互联网的核心,下面介绍每一层的功能,主要就是介绍每一层的主要协议。
二、物理层/实体层[Physical Layer]
我们从最底下的一层开始。
电脑要组网,第一件事要干什么?当然是先把电脑连起来,可以用光缆、电缆、双绞线、无线电波等方式。
这就叫做"实体层",它就是把电脑连接起来的物理手段。它主要规定了网络的一些电气特性,作用是负责传送0和1的电信号。
三、链接层/链路层/数据链路层[Link Layer]
3.1 定义
单纯的0和1没有任何意义,必须规定解读方式:多少个电信号算一组?每个信号位有何意义?
这就是"链接层"的功能,它在"实体层"的上方,确定了0和1的分组方式。
3.2 以太网协议
早期的时候,每家公司都有自己的电信号分组方式。逐渐地,一种)叫做"以太网"(Ethernet)的协议,占据了主导地位。
以太网规定,一组电信号构成一个数据包,叫做"帧"(Frame)。每一帧分成两个部分:标头(Head)和数据(Data)。
"标头"包含数据包的一些说明项,比如发送者、接受者、数据类型等等;"数据"则是数据包的具体内容。
"标头"的长度,固定为18字节。"数据"的长度,最短为46字节,最长为1500字节。因此,整个"帧"最短为64字节,最长为1518字节。如果数据很长,就必须分割成多个帧进行发送。
3.3以太网帧格式
http://zh.wikipedia.org/wiki/以太网帧格式
在以太网链路上的数据包称作以太帧。以太帧起始部分由前导码和帧开始符组成。后面紧跟着一个以太网报头,以MAC地址说明目的地址和源地址。帧的中部是该帧负载的包含其他协议报头的数据包(例如IP协议)。以太帧由一个32位冗余校验码结尾。它用于检验数据传输是否出现损坏
结构
来自线路的二进制数据包称作一个帧。从物理线路上看到的帧,除其他信息外,还可看到前导码和帧开始符。任何物理硬件都会需要这些信息。[note 1]
下面的表格显示了在以1500个八位元组为MTU传输(有些吉比特以太网甚至更高速以太网支持更大的帧,称作巨型帧)时的完整帧格式。[note 2] 一个八位元组是八个位组成的数据(也就是现代计算机的一个字节)。
802.3 以太网帧结构 |
||||||||
前导码 |
帧开始符 |
MAC 目标地址 |
MAC 源地址 |
802.1Q 标签 (可选) |
以太类型或长度 |
负载 |
||
10101010 7个octet |
10101011 1个octet |
6 octets |
6 octets |
(4 octets) |
2 octets |
46–1500 octets |
4 octets |
12 octets |
64–1522 octets |
||||||||
72–1530 octets |
||||||||
84–1542 octets |
3.3 MAC地址
上面提到,以太网数据包的"标头",包含了发送者和接受者的信息。那么,发送者和接受者是如何标识呢?
以太网规定,连入网络的所有设备,都必须具有"网卡"接口。数据包必须是从一块网卡,传送到另一块网卡。网卡的地址,就是数据包的发送地址和接收地址,这叫做MAC地址。
每块网卡出厂的时候,都有一个全世界独一无二的MAC地址,长度是48个二进制位,通常用12个十六进制数表示。
MAC地址可不是世界上独一无二的,只是其散列足够大,使得在同一个子网中MAC地址碰巧相同的两块网卡几率很小很小而已。使用网络15年来,已经碰到过两次网卡MAC相同的事了。
而且MAC相同有两个方法解决:
1、网卡厂商提供有配置程序,可以直接改硬件MAC。如果使用二层网络系统就只能用此法解决,
2、各种操作系统也都提供伪造MAC地址的方式来解决三层系统中MAC冲突的问题。
前6个十六进制数是厂商编号,后6个是该厂商的网卡流水号。有了MAC地址,就可以定位网卡和数据包的路径了。
3.4 广播
定义地址只是第一步,后面还有更多的步骤。
首先,一块网卡怎么会知道另一块网卡的MAC地址?
回答是有一种ARP协议,可以解决这个问题。这个留到后面介绍,这里只需要知道,以太网数据包必须知道接收方的MAC地址,然后才能发送。
其次,就算有了MAC地址,系统怎样才能把数据包准确送到接收方?
回答是以太网采用了一种很"原始"的方式,它不是把数据包准确送到接收方,而是向本网络内所有计算机发送,让每台计算机自己判断,是否为接收方。
老式的的集线器(市面上已经不存在了)才是这种模式,现在交换机会记录并缓存各个交换机端口连接的设备的MAC地址,转发包时会按缓存的地址表转发到特定的交换机端口,除非地址表里面没有目的MAC地址,不然不会广播包的。
每个IP node(主机,路由器)只要在LAN上就会有ARP table,其中存放IP/MAC的映射。
如果A要发送给B数据包,而B不在A的ARP TABLE中,才会广播。A得到B的MAC之后,会cache IP-MAC在自己的ARP TABLE中。
当然了,ARP TABLE不是生来就有,都是从无到有的。所以是通过广播方式建立起来的。强调广播没有错,但是ARP TABLE也很重要。
上图中,1号计算机向2号计算机发送一个数据包,同一个子网络的3号、4号、5号计算机都会收到这个包。它们读取这个包的"标头",找到接收方的MAC地址,然后与自身的MAC地址相比较,如果两者相同,就接受这个包,做进一步处理,否则就丢弃这个包。这种发送方式就叫做"广播"(broadcasting)。
有了数据包的定义、网卡的MAC地址、广播的发送方式,"链接层"就可以在多台计算机之间传送数据了。
四、网络层
4.1 网络层的由来
以太网协议,依靠MAC地址发送数据。理论上,单单依靠MAC地址,上海的网卡就可以找到洛杉矶的网卡了,技术上是可以实现的。
但是,这样做有一个重大的缺点。以太网采用广播方式发送数据包,所有成员人手一"包",不仅效率低,而且局限在发送者所在的子网络。也就是说,如果两台计算机不在同一个子网络,广播是传不过去的。这种设计是合理的,否则互联网上每一台计算机都会收到所有包,那会引起灾难。
互联网是无数子网络共同组成的一个巨型网络,很像想象上海和洛杉矶的电脑会在同一个子网络,这几乎是不可能的。
因此,必须找到一种方法,能够区分哪些MAC地址属于同一个子网络,哪些不是。如果是同一个子网络,就采用广播方式发送,否则就采用"路由"方式发送。("路由"的意思,就是指如何向不同的子网络分发数据包,这是一个很大的主题,本文不涉及。)遗憾的是,MAC地址本身无法做到这一点。它只与厂商有关,与所处网络无关。
这就导致了"网络层"的诞生。它的作用是引进一套新的地址,使得我们能够区分不同的计算机是否属于同一个子网络。这套地址就叫做"网络地址",简称"网址"。
于是,"网络层"出现以后,每台计算机有了两种地址,一种是MAC地址,另一种是网络地址。两种地址之间没有任何联系,MAC地址是绑定在网卡上的,网络地址则是管理员分配的,它们只是随机组合在一起。
网络地址帮助我们确定计算机所在的子网络,MAC地址则将数据包送到该子网络中的目标网卡。因此,从逻辑上可以推断,必定是先处理网络地址,然后再处理MAC地址。
4.2 IP协议
规定网络地址的协议,叫做IP协议。它所定义的地址,就被称为IP地址。
目前,广泛采用的是IP协议第四版,简称IPv4。这个版本规定,网络地址由32个二进制位组成。
习惯上,我们用分成四段的十进制数表示IP地址,从0.0.0.0一直到255.255.255.255。
互联网上的每一台计算机,都会分配到一个IP地址。这个地址分成两个部分,前一部分代表网络,后一部分代表主机。比如,IP地址172.16.254.1,这是一个32位的地址,假定它的网络部分是前24位(172.16.254),那么主机部分就是后8位(最后的那个1)。处于同一个子网络的电脑,它们IP地址的网络部分必定是相同的,也就是说172.16.254.2应该与172.16.254.1处在同一个子网络。
但是,问题在于单单从IP地址,我们无法判断网络部分。还是以172.16.254.1为例,它的网络部分,到底是前24位,还是前16位,甚至前28位,从IP地址上是看不出来的。
那么,怎样才能从IP地址,判断两台计算机是否属于同一个子网络呢?这就要用到另一个参数"子网掩码"(subnet mask)。
所谓"子网掩码",就是表示子网络特征的一个参数。它在形式上等同于IP地址,也是一个32位二进制数字,它的网络部分全部为1,主机部分全部为0。比如,IP地址172.16.254.1,如果已知网络部分是前24位,主机部分是后8位,那么子网络掩码就是11111111.11111111.11111111.00000000,写成十进制就是255.255.255.0。
知道"子网掩码",我们就能判断,任意两个IP地址是否处在同一个子网络。方法是将两个IP地址与子网掩码分别进行AND运算(两个数位都为1,运算结果为1,否则为0),然后比较结果是否相同,如果是的话,就表明它们在同一个子网络中,否则就不是。
比如,已知IP地址172.16.254.1和172.16.254.233的子网掩码都是255.255.255.0,请问它们是否在同一个子网络?两者与子网掩码分别进行AND运算,结果都是172.16.254.0,因此它们在同一个子网络。
IP地址与子网掩码进行"按位与"运算,得到网络地址。
ip地址172.16.254.233 换算成二进制格式: 10101100 00010000 11111110 11101001
掩码地址255.255.255.0 换算成二进制格式: 11111111 11111111 11111111 00000000
你把它们逐位进行&运算, 就可以将最后的8位全部变为0, 而前24位数字不变. 得到10101100 00010000 11111110 00000000, 即172.16.254.0
第一个IP地址的网络地址:
172. 16.254.1& 255.255.255.0= 172. 16.254.0
第二个IP地址的网络地址:
172. 16.254.233& 255.255.255.0= 172. 16.254.0
所谓"掩码",就是掩去主机部分,保留网络部分。
总结一下,IP协议的作用主要有两个,一个是为每一台计算机分配IP地址,另一个是确定哪些地址在同一个子网络。
一个子网192.168.0.0/24
表示网段,这个网段的IP地址从192.168.0.1开始,到192.168.0.254结束(192.168.0.0和192.168.0.255有特殊含义,不能用作IP地址);子网掩码是255.255.255.0。
这个就是用CIDR(无类别域间路由选择,Classless and Subnet Address Extensions and Supernetting))的形式表示的一个网段,或者说子网。
上面的子网掩码怎么来的呢?其实关键就在“24”上。我们知道IP地址是四个十进制数组成的,相当于32位二进制。
用CIDR表示形式,后一个数字将这32位进行了间隔(以24为例):前24位用"1"表示,后面8位用0表示,得到一个二进制数: 11111111 11111111 11111111 00000000。 将其转化为十进制,就是:255.255.255.0了。
4.3 IP数据包
根据IP协议发送的数据,就叫做IP数据包。不难想象,其中必定包括IP地址信息。
但是前面说过,以太网数据包只包含MAC地址,并没有IP地址的栏位。那么是否需要修改数据定义,再添加一个栏位呢?
回答是不需要,我们可以把IP数据包直接放进以太网数据包的"数据"部分,因此完全不用修改以太网的规格。这就是互联网分层结构的好处:上层的变动完全不涉及下层的结构。
具体来说,IP数据包也分为"标头"和"数据"两个部分。
"标头"部分主要包括版本、长度、IP地址等信息,"数据"部分则是IP数据包的具体内容。它放进以太网数据包后,以太网数据包就变成了下面这样。
IP数据包的"标头"部分的长度为20到60字节,整个数据包的总长度最大为65,535字节。因此,理论上,一个IP数据包的"数据"部分,最长为65,515字节。前面说过,以太网数据包的"数据"部分,最长只有1500字节。因此,如果IP数据包超过了1500字节,它就需要分割成几个以太网数据包,分开发送了。
4.4 ARP协议
关于"网络层",还有最后一点需要说明。
因为IP数据包是放在以太网数据包里发送的,所以我们必须同时知道两个地址,一个是对方的MAC地址,另一个是对方的IP地址。通常情况下,对方的IP地址是已知的(后文会解释),但是我们不知道它的MAC地址。
所以,我们需要一种机制,能够从IP地址得到MAC地址。
这里又可以分成两种情况。第一种情况,如果两台主机不在同一个子网络,那么事实上没有办法得到对方的MAC地址,只能把数据包传送到两个子网络连接处的"网关"(gateway),让网关去处理。
第二种情况,如果两台主机在同一个子网络,那么我们可以用ARP协议,得到对方的MAC地址。ARP协议也是发出一个数据包(包含在以太网数据包中),其中包含它所要查询主机的IP地址,在对方的MAC地址这一栏,填的是FF:FF:FF:FF:FF:FF,表示这是一个"广播"地址。它所在子网络的每一台主机,都会收到这个数据包,从中取出IP地址,与自身的IP地址进行比较。如果两者相同,都做出回复,向对方报告自己的MAC地址,否则就丢弃这个包。
总之,有了ARP协议之后,我们就可以得到同一个子网络内的主机MAC地址,可以把数据包发送到任意一台主机之上了。
五、传输层
5.1 传输层的由来
有了MAC地址和IP地址,我们已经可以在互联网上任意两台主机上建立通信。
接下来的问题是,同一台主机上有许多程序都需要用到网络,比如,你一边浏览网页,一边与朋友在线聊天。当一个数据包从互联网上发来的时候,你怎么知道,它是表示网页的内容,还是表示在线聊天的内容?
也就是说,我们还需要一个参数,表示这个数据包到底供哪个程序(进程)使用。这个参数就叫做"端口"(port),它其实是每一个使用网卡的程序的编号。每个数据包都发到主机的特定端口,所以不同的程序就能取到自己所需要的数据。
"端口"是0到65535之间的一个整数,正好16个二进制位。0到1023的端口被系统占用,用户只能选用大于1023的端口。不管是浏览网页还是在线聊天,应用程序会随机选用一个端口,然后与服务器的相应端口联系。
"传输层"的功能,就是建立"端口到端口"的通信。相比之下,"网络层"的功能是建立"主机到主机"的通信。只要确定主机和端口,我们就能实现程序之间的交流。因此,Unix系统就把主机+端口,叫做"套接字"(socket)。有了它,就可以进行网络应用程序开发了。
5.2 UDP协议
现在,我们必须在数据包中加入端口信息,这就需要新的协议。最简单的实现叫做UDP协议,它的格式几乎就是在数据前面,加上端口号。
UDP数据包,也是由"标头"和"数据"两部分组成。
"标头"部分主要定义了发出端口和接收端口,"数据"部分就是具体的内容。然后,把整个UDP数据包放入IP数据包的"数据"部分,而前面说过,IP数据包又是放在以太网数据包之中的,所以整个以太网数据包现在变成了下面这样:
UDP数据包非常简单,"标头"部分一共只有8个字节,总长度不超过65,535字节,正好放进一个IP数据包。
5.3 TCP协议
UDP协议的优点是比较简单,容易实现,但是缺点是可靠性较差,一旦数据包发出,无法知道对方是否收到。
为了解决这个问题,提高网络可靠性,TCP协议就诞生了。这个协议非常复杂,但可以近似认为,它就是有确认机制的UDP协议,每发出一个数据包都要求确认。如果有一个数据包遗失,就收不到确认,发出方就知道有必要重发这个数据包了。
因此,TCP协议能够确保数据不会遗失。它的缺点是过程复杂、实现困难、消耗较多的资源。
TCP数据包和UDP数据包一样,都是内嵌在IP数据包的"数据"部分。TCP数据包没有长度限制,理论上可以无限长,但是为了保证网络的效率,通常TCP数据包的长度不会超过IP数据包的长度,以确保单个TCP数据包不必再分割。
六、应用层
应用程序收到"传输层"的数据,接下来就要进行解读。由于互联网是开放架构,数据来源五花八门,必须事先规定好格式,否则根本无法解读。
"应用层"的作用,就是规定应用程序的数据格式。
举例来说,TCP协议可以为各种各样的程序传递数据,比如Email、WWW、FTP等等。那么,必须有不同协议规定电子邮件、网页、FTP数据的格式,这些应用程序协议就构成了"应用层"。
这是最高的一层,直接面对用户。它的数据就放在TCP数据包的"数据"部分。因此,现在的以太网的数据包就变成下面这样。
至此,整个互联网的五层结构,自下而上全部讲完了。这是从系统的角度,解释互联网是如何构成的。下一篇,我反过来,从用户的角度,自上而下看看这个结构是如何发挥作用,完成一次网络数据交换的。
(完)
互联网协议入门
上一篇文章分析了互联网的总体构思,从下至上,每一层协议的设计思想。
这是从设计者的角度看问题,今天我想切换到用户的角度,看看用户是如何从上至下,与这些协议互动的。
==============================================================
互联网协议入门(二)
作者:阮一峰
(接上文)
七、一个小结
先对前面的内容,做一个小结。
我们已经知道,网络通信就是交换数据包。电脑A向电脑B发送一个数据包,后者收到了,回复一个数据包,从而实现两台电脑之间的通信。数据包的结构,基本上是下面这样:
发送这个包,需要知道两个地址:
* 对方的MAC地址
* 对方的IP地址
有了这两个地址,数据包才能准确送到接收者手中。但是,前面说过,MAC地址有局限性,如果两台电脑不在同一个子网络,就无法知道对方的MAC地址,必须通过网关(gateway)转发。
上图中,1号电脑要向4号电脑发送一个数据包。它先判断4号电脑是否在同一个子网络,结果发现不是(后文介绍判断方法),于是就把这个数据包发到网关A。网关A通过路由协议,发现4号电脑位于子网络B,又把数据包发给网关B,网关B再转发到4号电脑。
1号电脑把数据包发到网关A,必须知道网关A的MAC地址。所以,数据包的目标地址,实际上分成两种情况:
场景 |
数据包地址 |
同一个子网络 |
对方的MAC地址,对方的IP地址 |
非同一个子网络 |
网关的MAC地址,对方的IP地址 |
发送数据包之前,电脑必须判断对方是否在同一个子网络,然后选择相应的MAC地址。接下来,我们就来看,实际使用中,这个过程是怎么完成的。
如果不在一个网络内是不需要获取对方MAC的,因为MAC地址适用于同一网段通信的。需要的是网关的MAC,因为你要首先和网关通信。
八、用户的上网设置
8.1 静态IP地址
你买了一台新电脑,插上网线,开机,这时电脑能够上网吗?
通常你必须做一些设置。有时,管理员(或者ISP)会告诉你下面四个参数,你把它们填入操作系统,计算机就能连上网了:
* 本机的IP地址
* 子网掩码
* 网关的IP地址
* DNS的IP地址
下图是Windows系统的设置窗口。
这四个参数缺一不可,后文会解释为什么需要知道它们才能上网。由于它们是给定的,计算机每次开机,都会分到同样的IP地址,所以这种情况被称作"静态IP地址上网"。
但是,这样的设置很专业,普通用户望而生畏,而且如果一台电脑的IP地址保持不变,其他电脑就不能使用这个地址,不够灵活。出于这两个原因,大多数用户使用"动态IP地址上网"。
8.2 动态IP地址
所谓"动态IP地址",指计算机开机后,会自动分配到一个IP地址,不用人为设定。它使用的协议叫做DHCP协议。
这个协议规定,每一个子网络中,有一台计算机负责管理本网络的所有IP地址,它叫做"DHCP服务器"。新的计算机加入网络,必须向"DHCP服务器"发送一个"DHCP请求"数据包,申请IP地址和相关的网络参数。
前面说过,如果两台计算机在同一个子网络,必须知道对方的MAC地址和IP地址,才能发送数据包。但是,新加入的计算机不知道这两个地址,怎么发送数据包呢?
DHCP协议做了一些巧妙的规定。
8.3 DHCP协议
首先,它是一种应用层协议,建立在UDP协议之上,所以整个数据包是这样的:
(1)最前面的"以太网标头",设置发出方(本机)的MAC地址和接收方(DHCP服务器)的MAC地址。前者就是本机网卡的MAC地址,后者这时不知道,就填入一个广播地址:FF-FF-FF-FF-FF-FF。
(2)后面的"IP标头",设置发出方的IP地址和接收方的IP地址。这时,对于这两者,本机都不知道。于是,发出方的IP地址就设为0.0.0.0,接收方的IP地址设为255.255.255.255。
(3)最后的"UDP标头",设置发出方的端口和接收方的端口。这一部分是DHCP协议规定好的,发出方是68端口,接收方是67端口。
这个数据包构造完成后,就可以发出了。以太网是广播发送,同一个子网络的每台计算机都收到了这个包。因为接收方的MAC地址是FF-FF-FF-FF-FF-FF,看不出是发给谁的,所以每台收到这个包的计算机,还必须分析这个包的IP地址,才能确定是不是发给自己的。当看到发出方IP地址是0.0.0.0,接收方是255.255.255.255,于是DHCP服务器知道"这个包是发给我的",而其他计算机就可以丢弃这个包。
接下来,DHCP服务器读出这个包的数据内容,分配好IP地址,发送回去一个"DHCP响应"数据包。这个响应包的结构也是类似的,以太网标头的MAC地址是双方的网卡地址,IP标头的IP地址是DHCP服务器的IP地址(发出方)和255.255.255.255(接收方),UDP标头的端口是67(发出方)和68(接收方),分配给请求端的IP地址和本网络的具体参数则包含在Data部分。
新加入的计算机收到这个响应包,于是就知道了自己的IP地址、子网掩码、网关地址、DNS服务器等等参数。
8.4 上网设置:小结
这个部分,需要记住的就是一点:不管是"静态IP地址"还是"动态IP地址",电脑上网的首要步骤,是确定四个参数。这四个值很重要,值得重复一遍:
* 本机的IP地址
* 子网掩码
* 网关的IP地址
* DNS的IP地址
有了这几个数值,电脑就可以上网"冲浪"了。接下来,我们来看一个实例,当用户访问网页的时候,互联网协议是怎么运作的。
九、一个实例:访问网页
9.1 本机参数
我们假定,经过上一节的步骤,用户设置好了自己的网络参数:
* 本机的IP地址:192.168.1.103
* 子网掩码:255.255.255.0
* 网关的IP地址:192.168.1.1
* DNS的IP地址:8.8.8.8
然后他打开浏览器,想要访问Google,在地址栏输入了网址:www.google.com。
这意味着,浏览器要向Google发送一个网页请求的数据包。
9.2 DNS协议
我们知道,发送数据包,必须要知道对方的IP地址。但是,现在,我们只知道网址www.google.com,不知道它的IP地址。
DNS协议可以帮助我们,将这个网址转换成IP地址。已知DNS服务器为8.8.8.8,于是我们向这个地址发送一个DNS数据包(53端口)。
然后,DNS服务器做出响应,告诉我们Google的IP地址是172.194.72.105。于是,我们知道了对方的IP地址。
9.3 子网掩码
接下来,我们要判断,这个IP地址是不是在同一个子网络,这就要用到子网掩码。
已知子网掩码是255.255.255.0,本机用它对自己的IP地址192.168.1.100,做一个二进制的AND运算(两个数位都为1,结果为1,否则为0),计算结果为192.168.1.0;然后对Google的IP地址172.194.72.105也做一个AND运算,计算结果为172.194.72.0。这两个结果不相等,所以结论是,Google与本机不在同一个子网络。
因此,我们要向Google发送数据包,必须通过网关192.168.1.1转发,也就是说,同一个包第一次接收方的MAC地址将是我们网关的MAC地址。
1)192.168.* 是保留地址,不允许在Internet上出现,一般用作内网。内网与外网之间可以通过NAT协议进行转换。
2)一个IP包从本地网络到达Google的服务器,中间需要经过多个路由器,而不是网关。
C:\Users\Sansan>ipconfig /all Windows IP Configuration 【Windows IP 配置】 Host Name . . . . . . . . . . . . : FA-FA 【域中计算机名、主机名】 Primary Dns Suffix . . . . . . . : 【主 DNS 后缀】 Node Type . . . . . . . . . . . . : Hybrid 【节点类型】 IP Routing Enabled. . . . . . . . : No 【IP路由服务是否启用】 WINS Proxy Enabled. . . . . . . . : No 【WINS代理服务是否启用 】 Wireless LAN adapter 本地连接* 3: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Microsoft Wi-Fi Direct 虚拟适配器 Physical Address. . . . . . . . . : 28-28-28-28-28-28 DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes Ethernet adapter 以太网: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : Description . . . . . . . . . . . : Realtek PCIe GBE 系列控制器 Physical Address. . . . . . . . . : FA-FA-FA-FA-FA-FA DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes Wireless LAN adapter WLAN: Connection-specific DNS Suffix . : 【连接特定的DNS后缀】 Description . . . . . . . . . . . : Realtek RTL8723AE 无线 LAN 802.11n PCI-E NIC【网卡型号描述】 Physical Address. . . . . . . . . : 28-28-28-28-28-28 【网卡MAC地址】 DHCP Enabled. . . . . . . . . . . : Yes 【动态主机设置协议是否启用】 Autoconfiguration Enabled . . . . : Yes 【】 Link-local IPv6 Address . . . . . : fe80::2cb7:589c:8054:1f53%3(Preferred) IPv4 Address. . . . . . . . . . . : 192.168.1.103(Preferred) 【IP地址】 Subnet Mask . . . . . . . . . . . : 255.255.255.0 【子网掩码】 Lease Obtained. . . . . . . . . . : Saturday, November 1, 2014 10:45:41 PM 【IP地址租用开始时间】 Lease Expires . . . . . . . . . . : Monday, November 3, 2014 10:45:38 AM 【IP地址租用结束时间】 Default Gateway . . . . . . . . . : 192.168.1.1 【默认网关】 DHCP Server . . . . . . . . . . . : 192.168.1.1 【DHCP管理者机子IP】 DHCPv6 IAID . . . . . . . . . . . : 53011271 DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1A-FD-83-E7-FA-FA-FA-FA-FA-FA DNS Servers . . . . . . . . . . . : 192.168.1.1 【DNS服务器地址】 NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter VMware Network Adapter VMnet1:
C:\Users\Sansan>tracert www.baidu.com Tracing route to www.a.shifen.com [111.13.100.91] over a maximum of 30 hops: 1 1 ms 1 ms 2 ms phicomm.routerlogin.net [192.168.1.1] 【默认网关】,是便宜的国产的菲讯无线路由器 2 2 ms 11 ms 1 ms promote.cache-dns.local [10.212.196.1] 【本地局域网】 3 4 ms 6 ms 2 ms 221.181.218.162 【江苏省南京市 移动】 4 * * 44 ms 221.181.218.33 【江苏省南京市 移动】 5 8 ms 5 ms 6 ms 221.178.162.42 【江苏省南京市 移动】 6 * * * Request timed out. 7 9 ms 12 ms 7 ms 221.183.14.185 【中国移动】 8 26 ms 25 ms 27 ms 221.183.10.118 【中国移动】 9 26 ms 25 ms 24 ms 221.183.12.34 【中国移动】 10 26 ms 30 ms 23 ms 111.13.98.161 【北京市 移动】 11 26 ms 26 ms 30 ms 111.13.108.1 【北京市 移动】 12 26 ms 26 ms 25 ms promote.cache-dns.local [10.29.243.1] 【本地局域网】 13 26 ms 26 ms 27 ms promote.cache-dns.local [10.29.240.6] 【本地局域网】 14 24 ms 27 ms 27 ms 111.13.100.91 【北京市 移动】,即百度一下,你就知道 Trace complete.
9.4 应用层协议
浏览网页用的是HTTP协议,它的整个数据包构造是这样的:
HTTP部分的内容,类似于下面这样:
GET / HTTP/1.1
Host: www.google.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1) ......
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: zh-CN,zh;q=0.8
Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3
Cookie: ... ...
我们假定这个部分的长度为4960字节,它会被嵌在TCP数据包之中。
9.5 TCP协议
TCP数据包需要设置端口,接收方(Google)的HTTP端口默认是80,发送方(本机)的端口是一个随机生成的1024-65535之间的整数,假定为51775。
TCP数据包的标头长度为20字节,加上嵌入HTTP的数据包,总长度变为4980字节。
9.6 IP协议
然后,TCP数据包再嵌入IP数据包。IP数据包需要设置双方的IP地址,这是已知的,发送方是192.168.1.100(本机),接收方是172.194.72.105(Google)。
IP数据包的标头长度为20字节,加上嵌入的TCP数据包,总长度变为5000字节。
9.7 以太网协议
最后,IP数据包嵌入以太网数据包。以太网数据包需要设置双方的MAC地址,发送方为本机的网卡MAC地址,接收方为网关192.168.1.1的MAC地址(通过ARP协议得到)。
以太网数据包的数据部分,最大长度为1500字节,而现在的IP数据包长度为5000字节。因此,IP数据包必须分割成四个包。因为每个包都有自己的IP标头(20字节),所以四个包的IP数据包的长度分别为1500、1500、1500、560。
9.8 服务器端响应
经过多个网关的转发,Google的服务器172.194.72.105,收到了这四个以太网数据包。
根据IP标头的序号,Google将四个包拼起来,取出完整的TCP数据包,然后读出里面的"HTTP请求",接着做出"HTTP响应",再用TCP协议发回来。
本机收到HTTP响应以后,就可以将网页显示出来,完成一次网络通信。
最后,IP数据包嵌入以太网数据包。以太网数据包需要设置双方的MAC地址,发送方为本机的网卡MAC地址,接收方为网关192.168.1.1的MAC地址(通过ARP协议得到)",发送到192.168.1.1,网关应该还会拆开包,根据子网掩码与操作发现不是在一个内网内,那么将会重新把这个包封装一下,接收方的为它的网关的MAC地址,就这样一层一层往上面找,最终到一个网关,网关根据子网掩码发现172.194.72.105(Google)和自己在一个内网内,然后将请求发给172.194.72.105(Google),这样就到达了目的地
这个例子就到此为止,虽然经过了简化,但它大致上反映了互联网协议的整个通信过程。
(完)
实践
最后还是最好使用wireshark抓包来实践下,再结合tcp/ip协议详解来学习,把知识连贯起来。
FAQ理解解答
1
其实,大多数情况都是这种"传送到两个子网络连接处的'网关'",我这里有几个疑问:
1. 是将IP packet传送到gateway吗?gateway在收到IP packet后会怎样处理?
2. 怎么知道gateway的地址?
3. IP路由表在这里扮演的角色?
看到的这样的问题,重读了一次阮先生的教材,才知道阮先生没有把概念讲清楚。
在internet协议中,根本没有MAC地址这样的定义,操作系统层面只认识IP地址,MAC地址仅仅是在链路层(也就是网卡和交换机之间)起作用,系统程序没有必要知道MAC地址。当然从前还有支持纯以太网协议的程序,那样的系统程序就只认mac,不认ip了……我所接触过的这样的程序,大多都是用mac广播方式工作的。因为以太网协议功能太弱,要实现比较复杂的通信过程,难度不是一般的大。
知道这个前提后,再回答这3个问题:
1、操作系统中配置好了网关地址,凡是操作系统发现不在"掩码"范围内的地址,通通都发给gateway……不管gateway是否真的存在;gateway收到ip packet后,会根据自己内部的路由表寻找目的机器的方向……注意是方向,因为如果网关不能直接访问到目标机器,就把IP packet发给路由表上的规定方向的gateway;
2、gateway的地址是配置在本机操作系统中的,既可以手工配置,也可以有dhcp配置;而且gateway可以配置多个,也就是说,对不同非子网IP,可以指定出不同的方向;
3、IP路由表就是前面说的方向配置清单,典型的是静态(一经配置不再变化)。当然特殊网络环境下还有动态路由表,也就是gateway之间有配套的通信协议,能互相知会一声,自动形成路由表……这个叫动态路由协议。
2
想确认下,在同一子网内,是在IP地址中全部写入 1,进行广播,然后比较MAC地址,还是在MAC地址中全部写入1,进行广播,然后比较IP????
据我的判断,有两种广播形式,不知道理解的对不对:
1,在本地子网广播,也就是在自己所在的子网广播,这样的广播地址应该是与mac地 址相关的,也就是mac地址设为全1
2,从本地向外网广播,这样的话,应该是把那个网络的主机部分设为全1,这里用到了ip地址,这个包是发往网关的,是由网关代替你广播的 网关收到主机ip全1的包后知道这是一个向其子网发送的广播包,这样,对,就这样了。。。。。
友链:
1987年的文章《Introduction to the Internet Protocols》
http://www.uic.edu/depts/accc/network/ftp/v452.html 3 July 1987
Introduction to the Internet Protocols |
|||||||||||
|
Computer Science Facilities Group This is an introduction to the Internet networking protocols (TCP/IP). It includes a summary of the facilities available and brief descriptions of the major protocols in the family. Copyright (C) 1987, Charles L. Hedrick. Anyone may reproduce this document, in whole or in part, provided that: (1) any copy or republication of the entire document must show Rutgers University as the source, and must include this notice; and (2) any other use of this material must reference this manual and Rutgers University, and the fact that the material is copyright by Charles Hedrick and is used by permission. Unix is a trademark of AT&T Technologies, Inc |
|
|||||||||
|
|
|
|||||||||
|
|
||||||||||
|
|
|
|||||||||
|
1. What is TCP/IP?
|
|
|||||||||
|
|
|
|||||||||
|
TCP/IP is a set of protocols developed to allow cooperating computers to share resources across a network. It was developed by a community of researchers centered around the ARPAnet. Certainly the ARPAnet is the best-known TCP/IP network. However as of June, 87, at least 130 different vendors had products that support TCP/IP, and thousands of networks of all kinds use it. First some basic definitions. The most accurate name for the set of protocols we are describing is the "Internet protocol suite". TCP and IP are two of the protocols in this suite. (They will be described below.) Because TCP and IP are the best known of the protocols, it has become common to use the term TCP/IP or IP/TCP to refer to the whole family. It is probably not worth fighting this habit. However this can lead to some oddities. For example, I find myself talking about NFS as being based on TCP/IP, even though it doesn't use TCP at all. (It does use IP. But it uses an alternative protocol, UDP, instead of TCP. All of this alphabet soup will be unscrambled in the following pages.) The Internet is a collection of networks, including the Arpanet, NSFnet, regional networks such as NYsernet, local networks at a number of University and research institutions, and a number of military networks. The term "Internet" applies to this entire set of networks. The subset of them that is managed by the Department of Defense is referred to as the "DDN" (Defense Data Network). This includes some research-oriented networks, such as the Arpanet, as well as more strictly military ones. (Because much of the funding for Internet protocol developments is done via the DDN organization, the terms Internet and DDN can sometimes seem equivalent.) All of these networks are connected to each other. Users can send messages from any of them to any other, except where there are security or other policy restrictions on access. Officially speaking, the Internet protocol documents are simply standards adopted by the Internet community for its own use. More recently, the Department of Defense issued a MILSPEC definition of TCP/IP. This was intended to be a more formal definition, appropriate for use in purchasing specifications. However most of the TCP/IP community continues to use the Internet standards. The MILSPEC version is intended to be consistent with it. Whatever it is called, TCP/IP is a family of protocols. A few provide "low-level" functions needed for many applications. These include IP, TCP, and UDP. (These will be described in a bit more detail later.) Others are protocols for doing specific tasks, e.g. transferring files between computers, sending mail, or finding out who is logged in on another computer. Initially TCP/IP was used mostly between minicomputers or mainframes. These machines had their own disks, and generally were self-contained. Thus the most important "traditional" TCP/IP services are: - file transfer. The file transfer protocol (FTP) allows a user on any computer to get files from another computer, or to send files to another computer. Security is handled by requiring the user to specify a user name and password for the other computer. Provisions are made for handling file transfer between machines with different character set, end of line conventions, etc. This is not quite the same thing as more recent "network file system" or "netbios" protocols, which will be described below. Rather, FTP is a utility that you run any time you want to access a file on another system. You use it to copy the file to your own system. You then work with the local copy. (See RFC 959 for specifications for FTP.) - remote login. The network terminal protocol (TELNET) allows a user to log in on any other computer on the network. You start a remote session by specifying a computer to connect to. From that time until you finish the session, anything you type is sent to the other computer. Note that you are really still talking to your own computer. But the telnet program effectively makes your computer invisible while it is running. Every character you type is sent directly to the other system. The remote system will ask you to log in and give a password, in whatever manner it would normally ask a user who had just dialed it up. When you log off of the other computer, the telnet program exits, and you will find yourself talking to your own computer. Microcomputer implementations of telnet generally include a terminal emulator for some common type of terminal. (See RFC's 854 and 855 for specifications for telnet. By the way, the telnet protocol should not be confused with Telenet, a vendor of commercial network services.) - computer mail. This allows you to send messages to users on other computers. Originally, people tended to use only one or two specific computers. They would maintain "mail files" on those machines. The computer mail system is simply a way for you to add a message to another user's mail file. There are some problems with this in an environment where microcomputers are used. The most serious is that a micro is not well suited to receive computer mail. When you send mail, the mail software expects to be able to open a connection to the addressee's computer, in order to send the mail. If this is a microcomputer, it may be turned off, or it may be running an application other than the mail system. For this reason, mail is normally handled by a larger system, where it is practical to have a mail server running all the time. Microcomputer mail software then becomes a user interface that retrieves mail from the mail server. (See RFC 821 and 822 for specifications for computer mail. See RFC 937 for a protocol designed for microcomputers to use in reading mail from a mail server.) These services should be present in any implementation of TCP/IP, except that micro-oriented implementations may not support computer mail. These traditional applications still play a very important role in TCP/IP-based networks. However more recently, the way in which networks are used has been changing. The older model of a number of large, self-sufficient computers is beginning to change. Now many installations have several kinds of computers, including microcomputers, workstations, minicomputers, and mainframes. These computers are likely to be configured to perform specialized tasks. Although people are still likely to work with one specific computer, that computer will call on other systems on the net for specialized services. This has led to the "server/client" model of network services. A server is a system that provides a specific service for the rest of the network. A client is another system that uses that service. (Note that the server and client need not be on different computers. They could be different programs running on the same computer.) Here are the kinds of servers typically present in a modern computer setup. Note that these computer services can all be provided within the framework of TCP/IP. - network file systems. This allows a system to access files on another computer in a somewhat more closely integrated fashion than FTP. A network file system provides the illusion that disks or other devices from one system are directly connected to other systems. There is no need to use a special network utility to access a file on another system. Your computer simply thinks it has some extra disk drives. These extra "virtual" drives refer to the other system's disks. This capability is useful for several different purposes. It lets you put large disks on a few computers, but still give others access to the disk space. Aside from the obvious economic benefits, this allows people working on several computers to share common files. It makes system maintenance and backup easier, because you don't have to worry about updating and backing up copies on lots of different machines. A number of vendors now offer high-performance diskless computers. These computers have no disk drives at all. They are entirely dependent upon disks attached to common "file servers". (See RFC's 1001 and 1002 for a description of PC-oriented NetBIOS over TCP. In the workstation and minicomputer area, Sun's Network File System is more likely to be used. Protocol specifications for it are available from Sun Microsystems.) - remote printing. This allows you to access printers on other computers as if they were directly attached to yours. (The most commonly used protocol is the remote lineprinter protocol from Berkeley Unix. Unfortunately, there is no protocol document for this. However the C code is easily obtained from Berkeley, so implementations are common.) - remote execution. This allows you to request that a particular program be run on a different computer. This is useful when you can do most of your work on a small computer, but a few tasks require the resources of a larger system. There are a number of different kinds of remote execution. Some operate on a command by command basis. That is, you request that a specific command or set of commands should run on some specific computer. (More sophisticated versions will choose a system that happens to be free.) However there are also "remote procedure call" systems that allow a program to call a subroutine that will run on another computer. (There are many protocols of this sort. Berkeley Unix contains two servers to execute commands remotely: rsh and rexec. The man pages describe the protocols that they use. The user-contributed software with Berkeley 4.3 contains a "distributed shell" that will distribute tasks among a set of systems, depending upon load. Remote procedure call mechanisms have been a topic for research for a number of years, so many organizations have implementations of such facilities. The most widespread commercially-supported remote procedure call protocols seem to be Xerox's Courier and Sun's RPC. Protocol documents are available from Xerox and Sun. There is a public implementation of Courier over TCP as part of the user-contributed software with Berkeley 4.3. An implementation of RPC was posted to Usenet by Sun, and also appears as part of the user-contributed software with Berkeley 4.3.) - name servers. In large installations, there are a number of different collections of names that have to be managed. This includes users and their passwords, names and network addresses for computers, and accounts. It becomes very tedious to keep this data up to date on all of the computers. Thus the databases are kept on a small number of systems. Other systems access the data over the network. (RFC 822 and 823 describe the name server protocol used to keep track of host names and Internet addresses on the Internet. This is now a required part of any TCP/IP implementation. IEN 116 describes an older name server protocol that is used by a few terminal servers and other products to look up host names. Sun's Yellow Pages system is designed as a general mechanism to handle user names, file sharing groups, and other databases commonly used by Unix systems. It is widely available commercially. Its protocol definition is available from Sun.) - terminal servers. Many installations no longer connect terminals directly to computers. Instead they connect them to terminal servers. A terminal server is simply a small computer that only knows how to run telnet (or some other protocol to do remote login). If your terminal is connected to one of these, you simply type the name of a computer, and you are connected to it. Generally it is possible to have active connections to more than one computer at the same time. The terminal server will have provisions to switch between connections rapidly, and to notify you when output is waiting for another connection. (Terminal servers use the telnet protocol, already mentioned. However any real terminal server will also have to support name service and a number of other protocols.) - network-oriented window systems. Until recently, high- performance graphics programs had to execute on a computer that had a bit-mapped graphics screen directly attached to it. Network window systems allow a program to use a display on a different computer. Full-scale network window systems provide an interface that lets you distribute jobs to the systems that are best suited to handle them, but still give you a single graphically-based user interface. (The most widely-implemented window system is X. A protocol description is available from MIT's Project Athena. A reference implementation is publically available from MIT. A number of vendors are also supporting NeWS, a window system defined by Sun. Both of these systems are designed to use TCP/IP.) Note that some of the protocols described above were designed by Berkeley, Sun, or other organizations. Thus they are not officially part of the Internet protocol suite. However they are implemented using TCP/IP, just as normal TCP/IP application protocols are. Since the protocol definitions are not considered proprietary, and since commercially-support implementations are widely available, it is reasonable to think of these protocols as being effectively part of the Internet suite. Note that the list above is simply a sample of the sort of services available through TCP/IP. However it does contain the majority of the "major" applications. The other commonly-used protocols tend to be specialized facilities for getting information of various kinds, such as who is logged in, the time of day, etc. However if you need a facility that is not listed here, we encourage you to look through the current edition of Internet Protocols (currently RFC 1011), which lists all of the available protocols, and also to look at some of the major TCP/IP implementations to see what various vendors have added. |
|
|||||||||
|
|
|
|||||||||
2. General description of the TCP/IP protocols |
|||||||||||
|
TCP/IP is a layered set of protocols. In order to understand what this means, it is useful to look at an example. A typical situation is sending mail. First, there is a protocol for mail. This defines a set of commands which one machine sends to another, e.g. commands to specify who the sender of the message is, who it is being sent to, and then the text of the message. However this protocol assumes that there is a way to communicate reliably between the two computers. Mail, like other application protocols, simply defines a set of commands and messages to be sent. It is designed to be used together with TCP and IP. TCP is responsible for making sure that the commands get through to the other end. It keeps track of what is sent, and retransmitts anything that did not get through. If any message is too large for one datagram, e.g. the text of the mail, TCP will split it up into several datagrams, and make sure that they all arrive correctly. Since these functions are needed for many applications, they are put together into a separate protocol, rather than being part of the specifications for sending mail. You can think of TCP as forming a library of routines that applications can use when they need reliable network communications with another computer. Similarly, TCP calls on the services of IP. Although the services that TCP supplies are needed by many applications, there are still some kinds of applications that don't need them. However there are some services that every application needs. So these services are put together into IP. As with TCP, you can think of IP as a library of routines that TCP calls on, but which is also available to applications that don't use TCP. This strategy of building several levels of protocol is called "layering". We think of the applications programs such as mail, TCP, and IP, as being separate "layers", each of which calls on the services of the layer below it. Generally, TCP/IP applications use 4 layers:
TCP/IP is based on the "catenet model". (This is described in more detail in IEN 48.) This model assumes that there are a large number of independent networks connected together by gateways. The user should be able to access computers or other resources on any of these networks. Datagrams will often pass through a dozen different networks before getting to their final destination. The routing needed to accomplish this should be completely invisible to the user. As far as the user is concerned, all he needs to know in order to access another system is an "Internet address". This is an address that looks like 128.6.4.194. It is actually a 32-bit number. However it is normally written as 4 decimal numbers, each representing 8 bits of the address. (The term "octet" is used by Internet documentation for such 8-bit chunks. The term "byte" is not used, because TCP/IP is supported by some computers that have byte sizes other than 8 bits.) Generally the structure of the address gives you some information about how to get to the system. For example, 128.6 is a network number assigned by a central authority to Rutgers University. Rutgers uses the next octet to indicate which of the campus Ethernets is involved. 128.6.4 happens to be an Ethernet used by the Computer Science Department. The last octet allows for up to 254 systems on each Ethernet. (It is 254 because 0 and 255 are not allowed, for reasons that will be discussed later.) Note that 128.6.4.194 and 128.6.5.194 would be different systems. The structure of an Internet address is described in a bit more detail later. Of course we normally refer to systems by name, rather than by Internet address. When we specify a name, the network software looks it up in a database, and comes up with the corresponding Internet address. Most of the network software deals strictly in terms of the address. (RFC 882 describes the name server technology used to handle this lookup.) TCP/IP is built on "connectionless" technology. Information is transfered as a sequence of "datagrams". A datagram is a collection of data that is sent as a single message. Each of these datagrams is sent through the network individually. There are provisions to open connections (i.e. to start a conversation that will continue for some time). However at some level, information from those connections is broken up into datagrams, and those datagrams are treated by the network as completely separate. For example, suppose you want to transfer a 15000 octet file. Most networks can't handle a 15000 octet datagram. So the protocols will break this up into something like 30 500-octet datagrams. Each of these datagrams will be sent to the other end. At that point, they will be put back together into the 15000-octet file. However while those datagrams are in transit, the network doesn't know that there is any connection between them. It is perfectly possible that datagram 14 will actually arrive before datagram 13. It is also possible that somewhere in the network, an error will occur, and some datagram won't get through at all. In that case, that datagram has to be sent again. Note by the way that the terms "datagram" and "packet" often seem to be nearly interchangable. Technically, datagram is the right word to use when describing TCP/IP. A datagram is a unit of data, which is what the protocols deal with. A packet is a physical thing, appearing on an Ethernet or some wire. In most cases a packet simply contains a datagram, so there is very little difference. However they can differ. When TCP/IP is used on top of X.25, the X.25 interface breaks the datagrams up into 128-byte packets. This is invisible to IP, because the packets are put back together into a single datagram at the other end before being processed by TCP/IP. So in this case, one IP datagram would be carried by several packets. However with most media, there are efficiency advantages to sending one datagram per packet, and so the distinction tends to vanish. Two separate protocols are involved in handling TCP/IP datagrams. TCP (the "transmission control protocol") is responsible for breaking up the message into datagrams, reassembling them at the other end, resending anything that gets lost, and putting things back in the right order. IP (the "internet protocol") is responsible for routing individual datagrams. It may seem like TCP is doing all the work. And in small networks that is true. However in the Internet, simply getting a datagram to its destination can be a complex job. A connection may require the datagram to go through several networks at Rutgers, a serial line to the John von Neuman Supercomputer Center, a couple of Ethernets there, a series of 56Kbaud phone lines to another NSFnet site, and more Ethernets on another campus. Keeping track of the routes to all of the destinations and handling incompatibilities among different transport media turns out to be a complex job. Note that the interface between TCP and IP is fairly simple. TCP simply hands IP a datagram with a destination. IP doesn't know how this datagram relates to any datagram before it or after it. It may have occurred to you that something is missing here. We have talked about Internet addresses, but not about how you keep track of multiple connections to a given system. Clearly it isn't enough to get a datagram to the right destination. TCP has to know which connection this datagram is part of. This task is referred to as "demultiplexing." In fact, there are several levels of demultiplexing going on in TCP/IP. The information needed to do this demultiplexing is contained in a series of "headers". A header is simply a few extra octets tacked onto the beginning of a datagram by some protocol in order to keep track of it. It's a lot like putting a letter into an envelope and putting an address on the outside of the envelope. Except with modern networks it happens several times. It's like you put the letter into a little envelope, your secretary puts that into a somewhat bigger envelope, the campus mail center puts that envelope into a still bigger one, etc. Here is an overview of the headers that get stuck on a message that passes through a typical TCP/IP network: We start with a single data stream, say a file you are trying to send to some other computer: ...................................................... TCP breaks it up into manageable chunks. (In order to do this, TCP has to know how large a datagram your network can handle. Actually, the TCP's at each end say how big a datagram they can handle, and then they pick the smallest size.) .... .... .... .... .... .... .... .... TCP puts a header at the front of each datagram. This header actually contains at least 20 octets, but the most important ones are a source and destination "port number" and a "sequence number". The port numbers are used to keep track of different conversations. Suppose 3 different people are transferring files. Your TCP might allocate port numbers 1000, 1001, and 1002 to these transfers. When you are sending a datagram, this becomes the "source" port number, since you are the source of the datagram. Of course the TCP at the other end has assigned a port number of its own for the conversation. Your TCP has to know the port number used by the other end as well. (It finds out when the connection starts, as we will explain below.) It puts this in the "destination" port field. Of course if the other end sends a datagram back to you, the source and destination port numbers will be reversed, since then it will be the source and you will be the destination. Each datagram has a sequence number. This is used so that the other end can make sure that it gets the datagrams in the right order, and that it hasn't missed any. (See the TCP specification for details.) TCP doesn't number the datagrams, but the octets. So if there are 500 octets of data in each datagram, the first datagram might be numbered 0, the second 500, the next 1000, the next 1500, etc. Finally, I will mention the Checksum. This is a number that is computed by adding up all the octets in the datagram (more or less - see the TCP spec). The result is put in the header. TCP at the other end computes the checksum again. If they disagree, then something bad happened to the datagram in transmission, and it is thrown away. So here's what the datagram looks like now. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | your data ... next 500 octets | | ...... | If we abbreviate the TCP header as "T", the whole file now looks like this: T.... T.... T.... T.... T.... T.... T.... You will note that there are items in the header that I have not described above. They are generally involved with managing the connection. In order to make sure the datagram has arrived at its destination, the recipient has to send back an "acknowledgement". This is a datagram whose "Acknowledgement number" field is filled in. For example, sending a packet with an acknowledgement of 1500 indicates that you have received all the data up to octet number 1500. If the sender doesn't get an acknowledgement within a reasonable amount of time, it sends the data again. The window is used to control how much data can be in transit at any one time. It is not practical to wait for each datagram to be acknowledged before sending the next one. That would slow things down too much. On the other hand, you can't just keep sending, or a fast computer might overrun the capacity of a slow one to absorb data. Thus each end indicates how much new data it is currently prepared to absorb by putting the number of octets in its "Window" field. As the computer receives data, the amount of space left in its window decreases. When it goes to zero, the sender has to stop. As the receiver processes the data, it increases its window, indicating that it is ready to accept more data. Often the same datagram can be used to acknowledge receipt of a set of data and to give permission for additional new data (by an updated window). The "Urgent" field allows one end to tell the other to skip ahead in its processing to a particular octet. This is often useful for handling asynchronous events, for example when you type a control character or other command that interrupts output. The other fields are beyond the scope of this document. TCP sends each of these datagrams to IP. Of course it has to tell IP the Internet address of the computer at the other end. Note that this is all IP is concerned about. It doesn't care about what is in the datagram, or even in the TCP header. IP's job is simply to find a route for the datagram and get it to the other end. In order to allow gateways or other intermediate systems to forward the datagram, it adds its own header. The main things in this header are the source and destination Internet address (32-bit addresses, like 128.6.4.194), the protocol number, and another checksum. The source Internet address is simply the address of your machine. (This is necessary so the other end knows where the datagram came from.) The destination Internet address is the address of the other machine. (This is necessary so any gateways in the middle know where you want the datagram to go.) The protocol number tells IP at the other end to send the datagram to TCP. Although most IP traffic uses TCP, there are other protocols that can use IP, so you have to tell IP which protocol to send the datagram to. Finally, the checksum allows IP at the other end to verify that the header wasn't damaged in transit. Note that TCP and IP have separate checksums. IP needs to be able to verify that the header didn't get damaged in transit, or it could send a message to the wrong place. For reasons not worth discussing here, it is both more efficient and safer to have TCP compute a separate checksum for the TCP header and data. Once IP has tacked on its header, here's what the message looks like: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TCP header, then your data ...... | If we represent the IP header by an "I", your file now looks like this: IT.... IT.... IT.... IT.... IT.... IT.... IT.... Again, the header contains some additional fields that have not been discussed. Most of them are beyond the scope of this document. The flags and fragment offset are used to keep track of the pieces when a datagram has to be split up. This can happen when datagrams are forwarded through a network for which they are too big. (This will be discussed a bit more below.) The time to live is a number that is decremented whenever the datagram passes through a system. When it goes to zero, the datagram is discarded. This is done in case a loop develops in the system somehow. Of course this should be impossible, but well-designed networks are built to cope with "impossible" conditions. At this point, it's possible that no more headers are needed. If your computer happens to have a direct phone line connecting it to the destination computer, or to a gateway, it may simply send the datagrams out on the line (though likely a synchronous protocol such as HDLC would be used, and it would add at least a few octets at the beginning and end). However most of our networks these days use Ethernet. So now we have to describe Ethernet's headers. Unfortunately, Ethernet has its own addresses. The people who designed Ethernet wanted to make sure that no two machines would end up with the same Ethernet address. Furthermore, they didn't want the user to have to worry about assigning addresses. So each Ethernet controller comes with an address builtin from the factory. In order to make sure that they would never have to reuse addresses, the Ethernet designers allocated 48 bits for the Ethernet address. People who make Ethernet equipment have to register with a central authority, to make sure that the numbers they assign don't overlap any other manufacturer. Ethernet is a "broadcast medium". That is, it is in effect like an old party line telephone. When you send a packet out on the Ethernet, every machine on the network sees the packet. So something is needed to make sure that the right machine gets it. As you might guess, this involves the Ethernet header. Every Ethernet packet has a 14-octet header that includes the source and destination Ethernet address, and a type code. Each machine is supposed to pay attention only to packets with its own Ethernet address in the destination field. (It's perfectly possible to cheat, which is one reason that Ethernet communications are not terribly secure.) Note that there is no connection between the Ethernet address and the Internet address. Each machine has to have a table of what Ethernet address corresponds to what Internet address. (We will describe how this table is constructed a bit later.) In addition to the addresses, the header contains a type code. The type code is to allow for several different protocol families to be used on the same network. So you can use TCP/IP, DECnet, Xerox NS, etc. at the same time. Each of them will put a different value in the type field. Finally, there is a checksum. The Ethernet controller computes a checksum of the entire packet. When the other end receives the packet, it recomputes the checksum, and throws the packet away if the answer disagrees with the original. The checksum is put on the end of the packet, not in the header. The final result is that your message looks like this: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethernet destination address (first 32 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethernet source address (last 32 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP header, then TCP header, then your data | | | ... | | | end of your data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethernet Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ If we represent the Ethernet header with "E", and the Ethernet checksum with "C", your file now looks like this: EIT....C EIT....C EIT....C EIT....C EIT....C When these packets are received by the other end, of course all the headers are removed. The Ethernet interface removes the Ethernet header and the checksum. It looks at the type code. Since the type code is the one assigned to IP, the Ethernet device driver passes the datagram up to IP. IP removes the IP header. It looks at the IP protocol field. Since the protocol type is TCP, it passes the datagram up to TCP. TCP now looks at the sequence number. It uses the sequence numbers and other information to combine all the datagrams into the original file. The ends our initial summary of TCP/IP. There are still some crucial concepts we haven't gotten to, so we'll now go back and add details in several areas. (For detailed descriptions of the items discussed here see, RFC 793 for TCP, RFC 791 for IP, and RFC's 894 and 826 for sending IP over Ethernet.) |
|
|||||||||
|
|
|
|||||||||
|
So far, we have described how a stream of data is broken up into datagrams, sent to another computer, and put back together. However something more is needed in order to accomplish anything useful. There has to be a way for you to open a connection to a specified computer, log into it, tell it what file you want, and control the transmission of the file. (If you have a different application in mind, e.g. computer mail, some analogous protocol is needed.) This is done by "application protocols". The application protocols run "on top" of TCP/IP. That is, when they want to send a message, they give the message to TCP. TCP makes sure it gets delivered to the other end. Because TCP and IP take care of all the networking details, the applications protocols can treat a network connection as if it were a simple byte stream, like a terminal or phone line. Before going into more details about applications programs, we have to describe how you find an application. Suppose you want to send a file to a computer whose Internet address is 128.6.4.7. To start the process, you need more than just the Internet address. You have to connect to the FTP server at the other end. In general, network programs are specialized for a specific set of tasks. Most systems have separate programs to handle file transfers, remote terminal logins, mail, etc. When you connect to 128.6.4.7, you have to specify that you want to talk to the FTP server. This is done by having "well-known sockets" for each server. Recall that TCP uses port numbers to keep track of individual conversations. User programs normally use more or less random port numbers. However specific port numbers are assigned to the programs that sit waiting for requests. For example, if you want to send a file, you will start a program called "ftp". It will open a connection using some random number, say 1234, for the port number on its end. However it will specify port number 21 for the other end. This is the official port number for the FTP server. Note that there are two different programs involved. You run ftp on your side. This is a program designed to accept commands from your terminal and pass them on to the other end. The program that you talk to on the other machine is the FTP server. It is designed to accept commands from the network connection, rather than an interactive terminal. There is no need for your program to use a well-known socket number for itself. Nobody is trying to find it. However the servers have to have well-known numbers, so that people can open connections to them and start sending them commands. The official port numbers for each program are given in "Assigned Numbers". Note that a connection is actually described by a set of 4 numbers: the Internet address at each end, and the TCP port number at each end. Every datagram has all four of those numbers in it. (The Internet addresses are in the IP header, and the TCP port numbers are in the TCP header.) In order to keep things straight, no two connections can have the same set of numbers. However it is enough for any one number to be different. For example, it is perfectly possible for two different users on a machine to be sending files to the same other machine. This could result in connections with the following parameters:
Since the same machines are involved, the Internet addresses are the same. Since they are both doing file transfers, one end of the connection involves the well-known port number for FTP. The only thing that differs is the port number for the program that the users are running. That's enough of a difference. Generally, at least one end of the connection asks the network software to assign it a port number that is guaranteed to be unique. Normally, it's the user's end, since the server has to use a well-known number. Now that we know how to open connections, let's get back to the applications programs. As mentioned earlier, once TCP has opened a connection, we have something that might as well be a simple wire. All the hard parts are handled by TCP and IP. However we still need some agreement as to what we send over this connection. In effect this is simply an agreement on what set of commands the application will understand, and the format in which they are to be sent. Generally, what is sent is a combination of commands and data. They use context to differentiate. For example, the mail protocol works like this: Your mail program opens a connection to the mail server at the other end. Your program gives it your machine's name, the sender of the message, and the recipients you want it sent to. It then sends a command saying that it is starting the message. At that point, the other end stops treating what it sees as commands, and starts accepting the message. Your end then starts sending the text of the message. At the end of the message, a special mark is sent (a dot in the first column). After that, both ends understand that your program is again sending commands. This is the simplest way to do things, and the one that most applications use. File transfer is somewhat more complex. The file transfer protocol involves two different connections. It starts out just like mail. The user's program sends commands like "log me in as this user", "here is my password", "send me the file with this name". However once the command to send data is sent, a second connection is opened for the data itself. It would certainly be possible to send the data on the same connection, as mail does. However file transfers often take a long time. The designers of the file transfer protocol wanted to allow the user to continue issuing commands while the transfer is going on. For example, the user might make an inquiry, or he might abort the transfer. Thus the designers felt it was best to use a separate connection for the data and leave the original command connection for commands. (It is also possible to open command connections to two different computers, and tell them to send a file from one to the other. In that case, the data couldn't go over the command connection.) Remote terminal connections use another mechanism still. For remote logins, there is just one connection. It normally sends data. When it is necessary to send a command (e.g. to set the terminal type or to change some mode), a special character is used to indicate that the next character is a command. If the user happens to type that special character as data, two of them are sent. We are not going to describe the application protocols in detail in this document. It's better to read the RFC's yourself. However there are a couple of common conventions used by applications that will be described here. First, the common network representation: TCP/IP is intended to be usable on any computer. Unfortunately, not all computers agree on how data is represented. There are differences in character codes (ASCII vs. EBCDIC), in end of line conventions (carriage return, line feed, or a representation using counts), and in whether terminals expect characters to be sent individually or a line at a time. In order to allow computers of different kinds to communicate, each applications protocol defines a standard representation. Note that TCP and IP do not care about the representation. TCP simply sends octets. However the programs at both ends have to agree on how the octets are to be interpreted. The RFC for each application specifies the standard representation for that application. Normally it is "net ASCII". This uses ASCII characters, with end of line denoted by a carriage return followed by a line feed. For remote login, there is also a definition of a "standard terminal", which turns out to be a half-duplex terminal with echoing happening on the local machine. Most applications also make provisions for the two computers to agree on other representations that they may find more convenient. For example, PDP-10's have 36-bit words. There is a way that two PDP-10's can agree to send a 36-bit binary file. Similarly, two systems that prefer full-duplex terminal conversations can agree on that. However each application has a standard representation, which every machine must support. 3.1 An example application: SMTP In order to give a bit better idea what is involved in the application protocols, I'm going to show an example of SMTP, which is the mail protocol. (SMTP is "simple mail transfer protocol.) We assume that a computer called TOPAZ.RUTGERS.EDU wants to send the following message. Date: Sat, 27 Jun 87 13:26:31 EDT From: hedrick@topaz.rutgers.edu To: levy@red.rutgers.edu Subject: meeting Let's get together Monday at 1pm. First, note that the format of the message itself is described by an Internet standard (RFC 822). The standard specifies the fact that the message must be transmitted as net ASCII (i.e. it must be ASCII, with carriage return/linefeed to delimit lines). It also describes the general structure, as a group of header lines, then a blank line, and then the body of the message. Finally, it describes the syntax of the header lines in detail. Generally they consist of a keyword and then a value. Note that the addressee is indicated as LEVY@RED.RUTGERS.EDU. Initially, addresses were simply "person at machine". However recent standards have made things more flexible. There are now provisions for systems to handle other systems' mail. This can allow automatic forwarding on behalf of computers not connected to the Internet. It can be used to direct mail for a number of systems to one central mail server. Indeed there is no requirement that an actual computer by the name of RED.RUTGERS.EDU even exist. The name servers could be set up so that you mail to department names, and each department's mail is routed automatically to an appropriate computer. It is also possible that the part before the @ is something other than a user name. It is possible for programs to be set up to process mail. There are also provisions to handle mailing lists, and generic names such as "postmaster" or "operator". The way the message is to be sent to another system is described by RFC's 821 and 974. The program that is going to be doing the sending asks the name server several queries to determine where to route the message. The first query is to find out which machines handle mail for the name RED.RUTGERS.EDU. In this case, the server replies that RED.RUTGERS.EDU handles its own mail. The program then asks for the address of RED.RUTGERS.EDU, which is 128.6.4.2. Then the mail program opens a TCP connection to port 25 on 128.6.4.2. Port 25 is the well-known socket used for receiving mail. Once this connection is established, the mail program starts sending commands. Here is a typical conversation. Each line is labelled as to whether it is from TOPAZ or RED. Note that TOPAZ initiated the connection: RED 220 RED.RUTGERS.EDU SMTP Service at 29 Jun 87 05:17:18 EDT TOPAZ HELO topaz.rutgers.edu RED 250 RED.RUTGERS.EDU - Hello, TOPAZ.RUTGERS.EDU TOPAZ MAIL From: RED 250 MAIL accepted TOPAZ RCPT To: RED 250 Recipient accepted TOPAZ DATA RED 354 Start mail input; end with . TOPAZ Date: Sat, 27 Jun 87 13:26:31 EDT TOPAZ From: hedrick@topaz.rutgers.edu TOPAZ To: levy@red.rutgers.edu TOPAZ Subject: meeting TOPAZ TOPAZ Let's get together Monday at 1pm. TOPAZ . RED 250 OK TOPAZ QUIT RED 221 RED.RUTGERS.EDU Se First, note that commands all use normal text. This is typical of the Internet standards. Many of the protocols use standard ASCII commands. This makes it easy to watch what is going on and to diagnose problems. For example, the mail program keeps a log of each conversation. If something goes wrong, the log file can simply be mailed to the postmaster. Since it is normal text, he can see what was going on. It also allows a human to interact directly with the mail server, for testing. (Some newer protocols are complex enough that this is not practical. The commands would have to have a syntax that would require a significant parser. Thus there is a tendency for newer protocols to use binary formats. Generally they are structured like C or Pascal record structures.) Second, note that the responses all begin with numbers. This is also typical of Internet protocols. The allowable responses are defined in the protocol. The numbers allow the user program to respond unambiguously. The rest of the response is text, which is normally for use by any human who may be watching or looking at a log. It has no effect on the operation of the programs. (However there is one point at which the protocol uses part of the text of the response.) The commands themselves simply allow the mail program on one end to tell the mail server the information it needs to know in order to deliver the message. In this case, the mail server could get the information by looking at the message itself. But for more complex cases, that would not be safe. Every session must begin with a HELO, which gives the name of the system that initiated the connection. Then the sender and recipients are specified. (There can be more than one RCPT command, if there are several recipients.) Finally the data itself is sent. Note that the text of the message is terminated by a line containing just a period. (If such a line appears in the message, the period is doubled.) After the message is accepted, the sender can send another message, or terminate the session as in the example above. Generally, there is a pattern to the response numbers. The protocol defines the specific set of responses that can be sent as answers to any given command. However programs that don't want to analyze them in detail can just look at the first digit. In general, responses that begin with a 2 indicate success. Those that begin with 3 indicate that some further action is needed, as shown above. 4 and 5 indicate errors. 4 is a "temporary" error, such as a disk filling. The message should be saved, and tried again later. 5 is a permanent error, such as a non-existent recipient. The message should be returned to the sender with an error message. (For more details about the protocols mentioned in this section, see RFC's 821/822 for mail, RFC 959 for file transfer, and RFC's 854/855 for remote logins. For the well-known port numbers, see the current edition of Assigned Numbers, and possibly RFC 814.) |
|
|||||||||
|
|
|
|||||||||
|
So far, we have described only connections that use TCP. Recall that TCP is responsible for breaking up messages into datagrams, and reassembling them properly. However in many applications, we have messages that will always fit in a single datagram. An example is name lookup. When a user attempts to make a connection to another system, he will generally specify the system by name, rather than Internet address. His system has to translate that name to an address before it can do anything. Generally, only a few systems have the database used to translate names to addresses. So the user's system will want to send a query to one of the systems that has the database. This query is going to be very short. It will certainly fit in one datagram. So will the answer. Thus it seems silly to use TCP. Of course TCP does more than just break things up into datagrams. It also makes sure that the data arrives, resending datagrams where necessary. But for a question that fits in a single datagram, we don't need all the complexity of TCP to do this. If we don't get an answer after a few seconds, we can just ask again. For applications like this, there are alternatives to TCP. The most common alternative is UDP ("user datagram protocol"). UDP is designed for applications where you don't need to put sequences of datagrams together. It fits into the system much like TCP. There is a UDP header. The network software puts the UDP header on the front of your data, just as it would put a TCP header on the front of your data. Then UDP sends the data to IP, which adds the IP header, putting UDP's protocol number in the protocol field instead of TCP's protocol number. However UDP doesn't do as much as TCP does. It doesn't split data into multiple datagrams. It doesn't keep track of what it has sent so it can resend if necessary. About all that UDP provides is port numbers, so that several programs can use UDP at once. UDP port numbers are used just like TCP port numbers. There are well-known port numbers for servers that use UDP. Note that the UDP header is shorter than a TCP header. It still has source and destination port numbers, and a checksum, but that's about it. No sequence number, since it is not needed. UDP is used by the protocols that handle name lookups (see IEN 116, RFC 882, and RFC 883), and a number of similar protocols. Another alternative protocol is ICMP ("Internet control message protocol"). ICMP is used for error messages, and other messages intended for the TCP/IP software itself, rather than any particular user program. For example, if you attempt to connect to a host, your system may get back an ICMP message saying "host unreachable". ICMP can also be used to find out some information about the network. See RFC 792 for details of ICMP. ICMP is similar to UDP, in that it handles messages that fit in one datagram. However it is even simpler than UDP. It doesn't even have port numbers in its header. Since all ICMP messages are interpreted by the network software itself, no port numbers are needed to say where a ICMP message is supposed to go. |
|
|||||||||
|
|
|
|||||||||
5. Keeping track of names and information: the domain system |
|||||||||||
|
As we indicated earlier, the network software generally needs a 32-bit Internet address in order to open a connection or send a datagram. However users prefer to deal with computer names rather than numbers. Thus there is a database that allows the software to look up a name and find the corresponding number. When the Internet was small, this was easy. Each system would have a file that listed all of the other systems, giving both their name and number. There are now too many computers for this approach to be practical. Thus these files have been replaced by a set of name servers that keep track of host names and the corresponding Internet addresses. (In fact these servers are somewhat more general than that. This is just one kind of information stored in the domain system.) Note that a set of interlocking servers are used, rather than a single central one. There are now so many different institutions connected to the Internet that it would be impractical for them to notify a central authority whenever they installed or moved a computer. Thus naming authority is delegated to individual institutions. The name servers form a tree, corresponding to institutional structure. The names themselves follow a similar structure. A typical example is the name BORAX.LCS.MIT.EDU. This is a computer at the Laboratory for Computer Science (LCS) at MIT. In order to find its Internet address, you might potentially have to consult 4 different servers. First, you would ask a central server (called the root) where the EDU server is. EDU is a server that keeps track of educational institutions. The root server would give you the names and Internet addresses of several servers for EDU. (There are several servers at each level, to allow for the possibly that one might be down.) You would then ask EDU where the server for MIT is. Again, it would give you names and Internet addresses of several servers for MIT. Generally, not all of those servers would be at MIT, to allow for the possibility of a general power failure at MIT. Then you would ask MIT where the server for LCS is, and finally you would ask one of the LCS servers about BORAX. The final result would be the Internet address for BORAX.LCS.MIT.EDU. Each of these levels is referred to as a "domain". The entire name, BORAX.LCS.MIT.EDU, is called a "domain name". (So are the names of the higher-level domains, such as LCS.MIT.EDU, MIT.EDU, and EDU.) Fortunately, you don't really have to go through all of this most of the time. First of all, the root name servers also happen to be the name servers for the top-level domains such as EDU. Thus a single query to a root server will get you to MIT. Second, software generally remembers answers that it got before. So once we look up a name at LCS.MIT.EDU, our software remembers where to find servers for LCS.MIT.EDU, MIT.EDU, and EDU. It also remembers the translation of BORAX.LCS.MIT.EDU. Each of these pieces of information has a "time to live" associated with it. Typically this is a few days. After that, the information expires and has to be looked up again. This allows institutions to change things. The domain system is not limited to finding out Internet addresses. Each domain name is a node in a database. The node can have records that define a number of different properties. Examples are Internet address, computer type, and a list of services provided by a computer. A program can ask for a specific piece of information, or all information about a given name. It is possible for a node in the database to be marked as an "alias" (or nickname) for another node. It is also possible to use the domain system to store information about users, mailing lists, or other objects. There is an Internet standard defining the operation of these databases, as well as the protocols used to make queries of them. Every network utility has to be able to make such queries, since this is now the official way to evaluate host names. Generally utilities will talk to a server on their own system. This server will take care of contacting the other servers for them. This keeps down the amount of code that has to be in each application program. The domain system is particularly important for handling computer mail. There are entry types to define what computer handles mail for a given name, to specify where an individual is to receive mail, and to define mailing lists. (See RFC's 882, 883, and 973 for specifications of the domain system. RFC 974 defines the use of the domain system in sending mail.) |
|
|||||||||
|
|
|
|||||||||
|
The description above indicated that the IP implementation is responsible for getting datagrams to the destination indicated by the destination address, but little was said about how this would be done. The task of finding how to get a datagram to its destination is referred to as "routing". In fact many of the details depend upon the particular implementation. However some general things can be said. First, it is necessary to understand the model on which IP is based. IP assumes that a system is attached to some local network. We assume that the system can send datagrams to any other system on its own network. (In the case of Ethernet, it simply finds the Ethernet address of the destination system, and puts the datagram out on the Ethernet.) The problem comes when a system is asked to send a datagram to a system on a different network. This problem is handled by gateways. A gateway is a system that connects a network with one or more other networks. Gateways are often normal computers that happen to have more than one network interface. For example, we have a Unix machine that has two different Ethernet interfaces. Thus it is connected to networks 128.6.4 and 128.6.3. This machine can act as a gateway between those two networks. The software on that machine must be set up so that it will forward datagrams from one network to the other. That is, if a machine on network 128.6.4 sends a datagram to the gateway, and the datagram is addressed to a machine on network 128.6.3, the gateway will forward the datagram to the destination. Major communications centers often have gateways that connect a number of different networks. (In many cases, special-purpose gateway systems provide better performance or reliability than general-purpose systems acting as gateways. A number of vendors sell such systems.) Routing in IP is based entirely upon the network number of the destination address. Each computer has a table of network numbers. For each network number, a gateway is listed. This is the gateway to be used to get to that network. Note that the gateway doesn't have to connect directly to the network. It just has to be the best place to go to get there. For example at Rutgers, our interface to NSFnet is at the John von Neuman Supercomputer Center (JvNC). Our connection to JvNC is via a high-speed serial line connected to a gateway whose address is 128.6.3.12. Systems on net 128.6.3 will list 128.6.3.12 as the gateway for many off-campus networks. However systems on net 128.6.4 will list 128.6.4.1 as the gateway to those same off-campus networks. 128.6.4.1 is the gateway between networks 128.6.4 and 128.6.3, so it is the first step in getting to JvNC. When a computer wants to send a datagram, it first checks to see if the destination address is on the system's own local network. If so, the datagram can be sent directly. Otherwise, the system expects to find an entry for the network that the destination address is on. The datagram is sent to the gateway listed in that entry. This table can get quite big. For example, the Internet now includes several hundred individual networks. Thus various strategies have been developed to reduce the size of the routing table. One strategy is to depend upon "default routes". Often, there is only one gateway out of a network. This gateway might connect a local Ethernet to a campus-wide backbone network. In that case, we don't need to have a separate entry for every network in the world. We simply define that gateway as a "default". When no specific route is found for a datagram, the datagram is sent to the default gateway. A default gateway can even be used when there are several gateways on a network. There are provisions for gateways to send a message saying "I'm not the best gateway -- use this one instead." (The message is sent via ICMP. See RFC 792.) Most network software is designed to use these messages to add entries to their routing tables. Suppose network 128.6.4 has two gateways, 128.6.4.59 and 128.6.4.1. 128.6.4.59 leads to several other internal Rutgers networks. 128.6.4.1 leads indirectly to the NSFnet. Suppose we set 128.6.4.59 as a default gateway, and have no other routing table entries. Now what happens when we need to send a datagram to MIT? MIT is network 18. Since we have no entry for network 18, the datagram will be sent to the default, 128.6.4.59. As it happens, this gateway is the wrong one. So it will forward the datagram to 128.6.4.1. But it will also send back an error saying in effect: "to get to network 18, use 128.6.4.1". Our software will then add an entry to the routing table. Any future datagrams to MIT will then go directly to 128.6.4.1. (The error message is sent using the ICMP protocol. The message type is called "ICMP redirect.") Most IP experts recommend that individual computers should not try to keep track of the entire network. Instead, they should start with default gateways, and let the gateways tell them the routes, as just described. However this doesn't say how the gateways should find out about the routes. The gateways can't depend upon this strategy. They have to have fairly complete routing tables. For this, some sort of routing protocol is needed. A routing protocol is simply a technique for the gateways to find each other, and keep up to date about the best way to get to every network. RFC 1009 contains a review of gateway design and routing. However rip.doc is probably a better introduction to the subject. It contains some tutorial material, and a detailed description of the most commonly-used routing protocol. |
|
|||||||||
|
|
|
|||||||||
7. Details about Internet addresses: subnets and broadcasting |
|||||||||||
|
As indicated earlier, Internet addresses are 32-bit numbers, normally written as 4 octets (in decimal), e.g. 128.6.4.7. There are actually 3 different types of address. The problem is that the address has to indicate both the network and the host within the network. It was felt that eventually there would be lots of networks. Many of them would be small, but probably 24 bits would be needed to represent all the IP networks. It was also felt that some very big networks might need 24 bits to represent all of their hosts. This would seem to lead to 48 bit addresses. But the designers really wanted to use 32 bit addresses. So they adopted a kludge. The assumption is that most of the networks will be small. So they set up three different ranges of address. Addresses beginning with 1 to 126 use only the first octet for the network number. The other three octets are available for the host number. Thus 24 bits are available for hosts. These numbers are used for large networks. But there can only be 126 of these very big networks. The Arpanet is one, and there are a few large commercial networks. But few normal organizations get one of these "class A" addresses. For normal large organizations, "class B" addresses are used. Class B addresses use the first two octets for the network number. Thus network numbers are 128.1 through 191.254. (We avoid 0 and 255, for reasons that we see below. We also avoid addresses beginning with 127, because that is used by some systems for special purposes.) The last two octets are available for host addesses, giving 16 bits of host address. This allows for 64516 computers, which should be enough for most organizations. (It is possible to get more than one class B address, if you run out.) Finally, class C addresses use three octets, in the range 192.1.1 to 223.254.254. These allow only 254 hosts on each network, but there can be lots of these networks. Addresses above 223 are reserved for future use, as class D and E (which are currently not defined). Many large organizations find it convenient to divide their network number into "subnets". For example, Rutgers has been assigned a class B address, 128.6. We find it convenient to use the third octet of the address to indicate which Ethernet a host is on. This division has no significance outside of Rutgers. A computer at another institution would treat all datagrams addressed to 128.6 the same way. They would not look at the third octet of the address. Thus computers outside Rutgers would not have different routes for 128.6.4 or 128.6.5. But inside Rutgers, we treat 128.6.4 and 128.6.5 as separate networks. In effect, gateways inside Rutgers have separate entries for each Rutgers subnet, whereas gateways outside Rutgers just have one entry for 128.6. Note that we could do exactly the same thing by using a separate class C address for each Ethernet. As far as Rutgers is concerned, it would be just as convenient for us to have a number of class C addresses. However using class C addresses would make things inconvenient for the rest of the world. Every institution that wanted to talk to us would have to have a separate entry for each one of our networks. If every institution did this, there would be far too many networks for any reasonable gateway to keep track of. By subdividing a class B network, we hide our internal structure from everyone else, and save them trouble. This subnet strategy requires special provisions in the network software. It is described in RFC 950. 0 and 255 have special meanings. 0 is reserved for machines that don't know their address. In certain circumstances it is possible for a machine not to know the number of the network it is on, or even its own host address. For example, 0.0.0.23 would be a machine that knew it was host number 23, but didn't know on what network. 255 is used for "broadcast". A broadcast is a message that you want every system on the network to see. Broadcasts are used in some situations where you don't know who to talk to. For example, suppose you need to look up a host name and get its Internet address. Sometimes you don't know the address of the nearest name server. In that case, you might send the request as a broadcast. There are also cases where a number of systems are interested in information. It is then less expensive to send a single broadcast than to send datagrams individually to each host that is interested in the information. In order to send a broadcast, you use an address that is made by using your network address, with all ones in the part of the address where the host number goes. For example, if you are on network 128.6.4, you would use 128.6.4.255 for broadcasts. How this is actually implemented depends upon the medium. It is not possible to send broadcasts on the Arpanet, or on point to point lines. However it is possible on an Ethernet. If you use an Ethernet address with all its bits on (all ones), every machine on the Ethernet is supposed to look at that datagram. Although the official broadcast address for network 128.6.4 is now 128.6.4.255, there are some other addresses that may be treated as broadcasts by certain implementations. For convenience, the standard also allows 255.255.255.255 to be used. This refers to all hosts on the local network. It is often simpler to use 255.255.255.255 instead of finding out the network number for the local network and forming a broadcast address such as 128.6.4.255. In addition, certain older implementations may use 0 instead of 255 to form the broadcast address. Such implementations would use 128.6.4.0 instead of 128.6.4.255 as the broadcast address on network 128.6.4. Finally, certain older implementations may not understand about subnets. Thus they consider the network number to be 128.6. In that case, they will assume a broadcast address of 128.6.255.255 or 128.6.0.0. Until support for broadcasts is implemented properly, it can be a somewhat dangerous feature to use. Because 0 and 255 are used for unknown and broadcast addresses, normal hosts should never be given addresses containing 0 or 255. Addresses should never begin with 0, 127, or any number above 223. Addresses violating these rules are sometimes referred to as "Martians", because of rumors that the Central University of Mars is using network 225. |
|
|||||||||
|
|
|
|||||||||
|
TCP/IP is designed for use with many different kinds of network. Unfortunately, network designers do not agree about how big packets can be. Ethernet packets can be 1500 octets long. Arpanet packets have a maximum of around 1000 octets. Some very fast networks have much larger packet sizes. At first, you might think that IP should simply settle on the smallest possible size. Unfortunately, this would cause serious performance problems. When transferring large files, big packets are far more efficient than small ones. So we want to be able to use the largest packet size possible. But we also want to be able to handle networks with small limits. There are two provisions for this. First, TCP has the ability to "negotiate" about datagram size. When a TCP connection first opens, both ends can send the maximum datagram size they can handle. The smaller of these numbers is used for the rest of the connection. This allows two implementations that can handle big datagrams to use them, but also lets them talk to implementations that can't handle them. However this doesn't completely solve the problem. The most serious problem is that the two ends don't necessarily know about all of the steps in between. For example, when sending data between Rutgers and Berkeley, it is likely that both computers will be on Ethernets. Thus they will both be prepared to handle 1500-octet datagrams. However the connection will at some point end up going over the Arpanet. It can't handle packets of that size. For this reason, there are provisions to split datagrams up into pieces. (This is referred to as "fragmentation".) The IP header contains fields indicating the a datagram has been split, and enough information to let the pieces be put back together. If a gateway connects an Ethernet to the Arpanet, it must be prepared to take 1500-octet Ethernet packets and split them into pieces that will fit on the Arpanet. Furthermore, every host implementation of TCP/IP must be prepared to accept pieces and put them back together. This is referred to as "reassembly". TCP/IP implementations differ in the approach they take to deciding on datagram size. It is fairly common for implementations to use 576-byte datagrams whenever they can't verify that the entire path is able to handle larger packets. This rather conservative strategy is used because of the number of implementations with bugs in the code to reassemble fragments. Implementors often try to avoid ever having fragmentation occur. Different implementors take different approaches to deciding when it is safe to use large datagrams. Some use them only for the local network. Others will use them for any network on the same campus. 576 bytes is a "safe" size, which every implementation must support. |
|
|||||||||
|
|
|
|||||||||
|
There was a brief discussion earlier about what IP datagrams look like on an Ethernet. The discussion showed the Ethernet header and checksum. However it left one hole: It didn't say how to figure out what Ethernet address to use when you want to talk to a given Internet address. In fact, there is a separate protocol for this, called ARP ("address resolution protocol"). (Note by the way that ARP is not an IP protocol. That is, the ARP datagrams do not have IP headers.) Suppose you are on system 128.6.4.194 and you want to connect to system 128.6.4.7. Your system will first verify that 128.6.4.7 is on the same network, so it can talk directly via Ethernet. Then it will look up 128.6.4.7 in its ARP table, to see if it already knows the Ethernet address. If so, it will stick on an Ethernet header, and send the packet. But suppose this system is not in the ARP table. There is no way to send the packet, because you need the Ethernet address. So it uses the ARP protocol to send an ARP request. Essentially an ARP request says "I need the Ethernet address for 128.6.4.7". Every system listens to ARP requests. When a system sees an ARP request for itself, it is required to respond. So 128.6.4.7 will see the request, and will respond with an ARP reply saying in effect "128.6.4.7 is 8:0:20:1:56:34". (Recall that Ethernet addresses are 48 bits. This is 6 octets. Ethernet addresses are conventionally shown in hex, using the punctuation shown.) Your system will save this information in its ARP table, so future packets will go directly. Most systems treat the ARP table as a cache, and clear entries in it if they have not been used in a certain period of time. Note by the way that ARP requests must be sent as "broadcasts". There is no way that an ARP request can be sent directly to the right system. After all, the whole reason for sending an ARP request is that you don't know the Ethernet address. So an Ethernet address of all ones is used, i.e. ff:ff:ff:ff:ff:ff. By convention, every machine on the Ethernet is required to pay attention to packets with this as an address. So every system sees every ARP requests. They all look to see whether the request is for their own address. If so, they respond. If not, they could just ignore it. (Some hosts will use ARP requests to update their knowledge about other hosts on the network, even if the request isn't for them.) Note that packets whose IP address indicates broadcast (e.g. 255.255.255.255 or 128.6.4.255) are also sent with an Ethernet address that is all ones. |
|
|||||||||
|
|
|
|||||||||
|
This directory contains documents describing the major protocols. There are literally hundreds of documents, so we have chosen the ones that seem most important. Internet standards are called RFC's. RFC stands for Request for Comment. A proposed standard is initially issued as a proposal, and given an RFC number. When it is finally accepted, it is added to Official Internet Protocols, but it is still referred to by the RFC number. We have also included two IEN's. (IEN's used to be a separate classification for more informal documents. This classification no longer exists -- RFC's are now used for all official Internet documents, and a mailing list is used for more informal reports.) The convention is that whenever an RFC is revised, the revised version gets a new number. This is fine for most purposes, but it causes problems with two documents: Assigned Numbers and Official Internet Protocols. These documents are being revised all the time, so the RFC number keeps changing. You will have to look in rfc-index.txt to find the number of the latest edition. Anyone who is seriously interested in TCP/IP should read the RFC describing IP (791). RFC 1009 is also useful. It is a specification for gateways to be used by NSFnet. As such, it contains an overview of a lot of the TCP/IP technology. You should probably also read the description of at least one of the application protocols, just to get a feel for the way things work. Mail is probably a good one (821/822). TCP (793) is of course a very basic specification. However the spec is fairly complex, so you should only read this when you have the time and patience to think about it carefully. Fortunately, the author of the major RFC's (Jon Postel) is a very good writer. The TCP RFC is far easier to read than you would expect, given the complexity of what it is describing. You can look at the other RFC's as you become curious about their subject matter. Here is a list of the documents you are more likely to want: rfc-index list of all RFC's rfc1012 somewhat fuller list of all RFC's rfc1011 Official Protocols. It's useful to scan this to see what tasks protocols have been built for. This defines which RFC's are actual standards, as opposed to requests for comments. rfc1010 Assigned Numbers. If you are working with TCP/IP, you will probably want a hardcopy of this as a reference. It's not very exciting to read. It lists all the offically defined well-known ports and lots of other things. rfc1009 NSFnet gateway specifications. A good overview of IP routing and gateway technology. rfc1001/2 netBIOS: networking for PC's rfc973 update on domains rfc959 FTP (file transfer) rfc950 subnets rfc937 POP2: protocol for reading mail on PC's rfc894 how IP is to be put on Ethernet, see also rfc825 rfc882/3 domains (the database used to go from host names to Internet address and back -- also used to handle UUCP these days). See also rfc973 rfc854/5 telnet - protocol for remote logins rfc826 ARP - protocol for finding out Ethernet addresses rfc821/2 rfc814 names and ports - general concepts behind well-known ports rfc793 TCP rfc792 ICMP rfc791 IP rfc768 UDP rip.doc details of the most commonly-used routing protocol ien-116 old name server (still needed by several kinds of system) ien-48 the Catenet model, general description of the philosophy behind TCP/IP The following documents are somewhat more specialized. rfc813 window and acknowledgement strategies in TCP rfc815 datagram reassembly techniques rfc816 fault isolation and resolution techniques rfc817 modularity and efficiency in implementation rfc879 the maximum segment size option in TCP rfc896 congestion control rfc827,888,904,975,985 EGP and related issues To those of you who may be reading this document remotely instead of at Rutgers: The most important RFC's have been collected into a three-volume set, the DDN Protocol Handbook. It is available from the DDN Network Information Center, SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025 (telephone: 800-235-3155). You should be able to get them via anonymous FTP from sri-nic.arpa. File names are: RFC's: rfc:rfc-index.txt IEN's: ien:ien-index.txt rip.doc is available by anonymous FTP from topaz.rutgers.edu, as /pub/tcp-ip-docs/rip.doc. Sites with access to UUCP but not FTP may be able to retreive them via UUCP from UUCP host rutgers. The file names would be RFC's: /topaz/pub/pub/tcp-ip-docs/rfc-index.txt IEN's: /topaz/pub/pub/tcp-ip-docs/ien-index.txt Note that SRI-NIC has the entire set of RFC's and IEN's, but rutgers and topaz have only those specifically mentioned above. |