远程过程调用-RPC

Posted on 2017-02-13 18:36 bw_0927 阅读(869) 评论(0) 收藏举报

RPC是建立在Socket之上的,RPC带来了开发C/S程序的简单可靠的手段,程序员不用关心C/S之间的通讯协议, 集中精力对付实现过程

那么, RPC与Socket通信的区别是什么呢?

RPC是建立在Socket之上的,RPC带来了开发C/S程序的简单可靠的手段,它通过一种叫XDR的数据表达方法描述数据,程序员写伪代码,然后由rpcgen程序翻译为真正的可编译的C语言源代码,再翻译成真正的Client端和Server端程序.

RPC作为普遍的C/S开发方法,开发效率高效,可靠. 但RPC方法的基本原则是--以模块调用的简单性忽略了通讯的具体细节,以便程序员不用关心C/S之间的通讯协议, 集中精力对付实现过程. 这就决定了RPC生成的通讯不可能对每种应用都有恰当的处理方法. 与Socket方法相比,传输相同的有效数据,RPC占用更多的网络带宽.

RPC是在Socket的基础上实现的, 它比socket需要更多的网络和系统资源. 另外, 在对程序优化时, 程序员虽然可以直接修改由rpcgen产生的令人费解的源程序. 但对于追求程序设计高效率的RPC而言, 获得的简单性则被大大削弱.

RPC和Socket的类比

RPC：老板不需要知道怎么使用MSN, 只要告诉秘书,秘书就会通过MSN与对方建立会话请求和相应.

而基于Socket的通信, 老板需要会使用MSN,这样,虽然老板需要实现培训一下关于MSN的知识, 但与对方通信时,就不用经过秘书了,效率会更高.

RPC的缺点：

Well, if you had time you could dig through my various IEEE Internet Computing columns from the past 6 years and find many reasons listed there. For example, “RPC Under Fire“(note that it’s PDF) from the Sep/Oct 2005 lists a number of problems:
Also, pretty much any of my columns that cover REST to any degree would also contain mentions of RPC’s shortcomings. All the columns can be found here:
But if you don’t have the time or energy, the fundamental problem is that RPC tries to make a distributed invocation look like a local one. This can’t work because the failure modes in distributed systems are
quite different from those in local systems, so you find yourself having to introduce more and more infrastructure that tries to hide all the hard details and problems that lurk beneath. That’s how we got
Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and SOAP and JAX-RPC, to name a few off the top of my head, each better than what came before in some ways but worse in other ways, especially footprint and complexity.

各种各样的RPC，他们弥补了一些缺陷的同时又带来了新的问题。

But it’s all for naught because no amount of infrastructure can ever hide those problems of distribution. Network partitions are real, timeouts are real, remote host and service
crashes are real, the need for piecemeal system upgrade and handling version differences between systems is real, etc. The distributed systems programmer *must* deal with these and other issues because
they affect different applications very differently; no amount of hiding or abstraction can make these problems disappear. As I said about such systems in a recent column:
“The layers of complexity required to maintain the resulting leaky illusion of local/remote transparency are reminiscent of the convoluted equations that pre-Copernican astronomers used to explain how the Sun and other planets revolved around the Earth.” (from “Serendipitous Reuse“)
RPC systems in C++, Java, etc. also tend to introduce higher degrees of coupling than one would like in a distributed system.

Typically you have some sort of IDL that’s used to generate stubs/proxies/skeletons code that turns the local calls into remote ones, which nobody wants to write or maintain by hand. 通过IDL，即上文的rpcgen来产生RPC用来通信的组件(客户端stub，服务端stub)

The IDL is often simple, but the generated code is usually not. That code is normally compiled into each app in the system. Change the IDL and you have to regenerate the code, recompile it, and then retest and redeploy your apps, and you typically have to do that atomically, either all apps or none, because versioning is not accounted for. In an already-deployed production system, it can be pretty hard to do atomic upgrades across the entire system. Overall, this approach makes for brittle, tightly-coupled systems.

Such systems also have problems with impedance mismatch between the IDL and whatever languages you’re translating it to. If the IDL is minimal so that it can be used with a wide variety of programming
languages, it means advanced features of well-stocked languages like Java and C++ can’t be used. OTOH if you make the IDL more powerful so that it’s closer to such languages, then translating it to C or other
more basic languages becomes quite difficult. On top of all that, no matter how you design the IDL type system, all the types won’t — indeed, can’t — map cleanly into every desired programming language. This turns into the need for non-idiomatic programming in one or more of the supported languages, and developers using those languages tend to complain about that. If you turn the whole process around by using a programming language like Java for your RPC IDL in an attempt to avoid the mismatch problems, you find it works only for that language, and that translating that language into other languages is quite difficult.
There’s also the need with these systems to have the same or similar infrastructure on both ends of the wire. Earlier posters to this thread complained about this, for example, when they mentioned having to have CORBA ORBs underneath all their participating applications. If you can’t get the exact same infrastructure under all endpoints, then you need to use interoperable infrastructure, which obviously relies on interoperability standards. These, unfortunately, are often problematic as well. CORBA interoperability, for example, eventually became pretty good, but it took about a decade. CORBA started out with
no interoperability protocol at all (in fact, it originally specified no network protocol at all), and then we suffered with interop problems for a few years once IIOP came along and both the protocol itself and implementations of it matured.
Ultimately, RPC is a leaky abstraction. It can’t hide what it tries to hide, and because of that, it can easily make the overall problem more difficult to deal with by adding a lot of accidental complexity.
In my previous message I specifically mentioned Erlang as having gotten it right. I believe that to be true not only because the handling of distribution is effectively built in and dealt with directly, but also because Erlang makes no attempt to hide those hard problems from the developer. Rather, it makes them known to the
developer by providing facilities for dealing with timeouts, failures, versioning, etc. I think what Erlang gives us goes a very long way and is well beyond anything I’ve experienced before. Erlang really doesn’t provide RPC according to the strict definition of the term, BTW, because remote calls don’t actually look like local ones.
(BTW, this is the kind of stuff I’ll be talking about at Erlang eXchange next month.)

=====================================

http://www.cppblog.com/jb8164/archive/2008/08/15/58949.html

一、什么是远程过程调用

什么是远程过程调用 RPC(Remote Procedure Call)? 你可能对这个概念有点陌生, 而你可能非常熟悉 NFS, 是的,
NFS 就是基于 RPC 的. 为了理解远程过程调用，我们先来看一下过程调用。

所谓过程调用，就是将控制从一个过程 A 传递到另一个过程 B, 返回时过程 B 将控制进程交给过程 A。目前大多数系统
中, 调用者和被调用者都在给定主机系统中的一个进程中, 它们是在生成可执行文件时由链接器连接起来的, 这类过程调用称
为本地过程调用。

远程过程调用(RPC)指的是由本地系统上的进程激活远程系统上的进程, 我们将此称为过程调用是因为它对程序员来说表现
为常规过程调用。处理远程过程调用的进程有两个, 一个是本地客户进程, 一个是远程服务器进程。对本地进程来说, 远程过
程调用表现这对客户进程的控制, 然后由客户进程生成一个消息, 通过网络系统调用发往远程服务器。网络信息中包括过程调
用所需要的参数, 远程服务器接到消息后调用相应过程, 然后将结果通过网络发回客户进程, 再由客户进程将结果返回给调用
进程。因此, 远程系统调用对调用者表现为本地过程调用, 但实际上是调用了远程系统上的过程。

二、远程过程调用模型

本地过程调用: 一个传统程序由一个或多个过程组成。它们往往按照一种调用等级来安排。如下图所示:

远程过程调用: 使用了和传统过程一样的抽象, 只是它允许一个过程的边界跨越两台计算机。如下图所示:

三、远程过程和本地过程的对比

首先, 网络延时会使一个远程过程的开销远远比本地过程要大
其次, 传统的过程调用因为被调用过程和调用过程运行在同一块内存空间上, 可以在过程间传递指针。而远程过程不能够将
指针作为参数, 因为远程过程与调用者运行在完全不同的地址空间中。
再次, 因为一个远程调用不能共享调用者的环境, 所以它就无法直接访问调用者的 I/O 描述符或操作系统功能。

四、远程过程调用的几种版本
(1) Sun RPC (UDP, TCP)
(2) Xerox Courier (SPP)
(3) Apollo RPC (UDP, DDS)

其中 Sun RPC 可用于面向连接或非面向连接的协议; Xerox Courier 仅用于面向连接的协议; Apollo RPC 仅用于非连接的协议

五、如何编写远程过程调用程序

为了将一个传统的程序改写成 RPC 程序, 我们要在程序里加入另外一些代码, 这个过程称作 stub 过程。我们可以想象一
个传统程序, 它的一个过程被转移到一个远程机器中。在远程过程一端, stub 过程取代了调用者。这样 stub 实现了远程过
程调用所需要的所有通信。因为 stub 与原来的调用使用了一样的接口, 因此增加这些 stub 过程既不需要更改原来的调用过
程, 也不要求更改原来的被调用过程。如下图所示:

六、示例
    此示例在 Ubuntu 8.04 + gcc 4.2.3 下编译运行通过。

    远程过程调用示例(点击下载)

RPC tries to make a distributed invocation look like a local one.

=============================

RPC：隐藏了不同主机之间的网络通讯及其相应的协议解析，RPC的使用者无需关注这些细节，精力集中在逻辑处理上。

===============================

http://blog.brucefeng.info/post/what-is-rpc

1. RPC

RPC (Remote Procedure Call)是一个计算机通信协议，该协议允许运行于一台计算机的程序调用另一台计算机的子程序，而程序员无需额外地为这个交互作用编程。RPC是一个分布式计算的CS模式，总是由Client向Server发出一个执行若干过程请求，Server接受请求，使用客户端提供的参数，计算完成之后将结果返回给客户端。RPC的协议有很多，比如最早的CORBA，Java RMI，Web Service的RPC风格，Hessian，Thrift，甚至Rest API。

1.1 RPC的调用流程

图片来自你应该知道的RPC原理

服务消费方（client）调用以本地调用方式调用服务；
client stub接收到调用后负责将方法、参数等组装成能够进行网络传输的消息体；（协议序列化）
client stub找到服务地址，并将消息发送到服务端；
server stub收到消息后进行解码；（反序列化）
server stub根据解码结果调用本地的服务；
本地服务执行并将结果返回给server stub；
server stub将返回结果打包成消息并发送至消费方；
client stub接收到消息，并进行解码；
服务消费方得到最终结果。

RPC的目标就是要2~8这些步骤都封装起来，让用户对这些细节透明。

1.2 RPC需要解决的问题

通讯的问题
服务寻址的问题
参数的序列化和反序列化
负载均衡的问题

2. 服务通信协议

服务通信主要是Client和Server之间建立网络连接，所有交换的数据都在这个连接里传输，连接可以是按需连接，调用结束后就断掉，也可以是长连接，多个远程过程调用共享同一个连接。不同的RPC框架可能使用不同的网络协议，常用的有直接使用TCP，基于HTTP/HTTP2.0 等，目前pigeon使用的有TCP和HTTP协议。在java领域一些工具和框架已经封装对底层协议的使用进行了封装例如比较著名的Netty(这也是pigeon在使用的通信框架），Apache MINA

3. 服务寻址

每次Client向Server端发起请求之前，需要知道该向哪个Server发请求，也就涉及到服务寻址的问题。最简单的方式是直接指定Server的ip,访问特定机器上的服务。

但是在分布式环境中，通过在代码中指定ip的方式是不合适的，所以现在的RPC框架基本都是使用配置的方式来实现服务的寻址。配置方式下涉及Server端服务的注册和Client端的服务寻址，这里就成为了Zookeeper应用的绝佳场景。

3.1 ZooKeeper 管理分布式服务配置

ZooKeeper 分布式服务框架是 Apache Hadoop 的一个子项目，它主要是用来解决分布式应用中经常遇到的一些数据管理问题，如：统一命名服务、状态同步服务、集群管理、分布式应用配置项的管理等。ZooKeeper的架构通过冗余服务实现高可用性。因此，如果第一次无应答，客户端就可以询问另一台ZooKeeper主机。ZooKeeper节点将它们的数据存储于一个分层的命名空间，非常类似于一个文件系统或一个前缀树结构。客户端可以在节点读写，从而以这种方式拥有一个共享的配置服务。

图片来自分布式服务框架 Zookeeper -- 管理分布式环境中的数据

3.1.1 服务端进行服务注册
当每次Server启动时，想调用ZooKeeper的Client，将自身提供的服务注册到ZooKeeper中，形如：

    http://service.test.com/demoservice/demo_1.0.0:127.0.0.1

当Server Shut down 时也会将自身节点在ZooKeeper配置中进行摘除。

3.1.2 客户端进行服务寻址

Client每次请求Server之前，需要向ZooKeeper中查询该服务对应的地址，例如需要使用demoservice，查询key为http://service.test.com/demoservice/demo_1.0.0的对应ip，即可访问对应服务。

3.1.3 服务修改广播

当一个服务发生修改时，如服务的启动与关闭，都会将消息发送到感兴趣的Client。

ZooKeeper的原理及使用参见 ZooKeeper项目

4. 序列化及反序列化

Client找到对应Server之后就需要传递参数到Server端，Server在处理完数据之后也需要将结果返回给Client。

每种序列化协议都有不同的优点和确定，一个成熟的序列化协议需要通盘考虑通用性，强健性，可调试性/可读性，性能，可扩展性以及安全性等方面。目前常见的序列化协议主要有hessian，XML、JSON、Protobuf、Thrift和Avro。

例如，pigeon支持多种序列化方式，序列化方式只需要在客户端调用时通过serialize属性指定，一般情况推荐兼容性最好的hessian。
如果需要自行设计序列化方式，可以继承com.dianping.pigeon.remoting.common.codec.DefaultAbstractSerializer类来定义自己的序列化类，并通过SerializerFactory.registerSerializer(byte serializerType, Serializer serializer)接口将自定义的序列化类注册进来。

5. 负载均衡

负载平衡（Load balancing）是一种计算机网络技术，用来在多个计算机（计算机集群）、网络连接、CPU、磁盘驱动器或其他资源中分配负载，以达到最佳化资源使用、最大化吞吐率、最小化响应时间、同时避免过载的目的。负载均衡器有各种各样的工作排程算法（用于决定将前端用户请求发送到哪一个后台服务器），最简单的是随机选择和轮询。更为高级的负载均衡器会考虑其它更多的相关因素，如后台服务器的负载，响应时间，运行状态，活动连接数，地理位置，处理能力，或最近分配的流量。

例如，pigeon支持random、roundRobin、weightedAutoware这几种类型，默认weighted Autoware策略。在pigeon框架中在每一次发送请求时都会有在线程中计算请求的返回时间，weighted Autoware策略是根据各个线程统计的响应时间来判断该服务的负载情况，响应时间越长说明该机器的负载越重。

6. 容灾

这里我们主要以pigeon为例介绍其实现容灾的方式，以供参考。

6.1 健康检测

在pigeon中，当使用tcp协议连接是，Client会定期发送心跳消息和接收心跳消息，如果心跳未正常返回，则在Client端会摘除这个Server节点，在后续服务寻址时不再选择该Server节点。

6.2 超时及重试

Client端可以配置一个服务的超时时间，当发送请求是，pigeon线程会计算该请求所耗费的时间，如果超过限制时间尚未返回，则根据配置返回超时或者进行重试，避免客户端过长时间的等待。

6.3 客户端配置集群策略模式

pigeon 的客户端集群策略有failfast、failover、failsafe和forking 四种方式。

failfast-调用服务的一个节点失败后抛出异常返回
failover-调用服务的一个节点失败后会尝试调用另外的一个节点
failsafe-调用服务的一个节点失败后不会抛出异常，返回null，后续版本会考虑按配置默认值返回
forking-同时调用服务的所有可用节点，返回调用最快的节点结果数据

6.4 服务隔离与限流

配置服务方法级别的最大并发数
限制某个客户端应用的最大并发数详情参见Pigeon开发指南。

7. 目前常见的RPC框架

Hessian

除了hessian协议之外，还提供了通过servlet方式实现的RPC框架

Thrift

Thrift是一个跨语言的服务部署框架，最初由Facebook于2007年开发，2008年进入Apache开源项目。Thrift通过一个中间语言(IDL, 接口定义语言)来定义RPC的接口和数据类型，然后通过一个编译器生成不同语言的代码（目前支持C++,Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk和OCaml）,并由生成的代码负责RPC协议层和传输层的实现。

Finagle

Finagle是Twitter基于Netty开发的支持容错的、协议无关的RPC框架，该框架支撑了Twitter的核心服务。

Dubbo
Dubbo 是阿里巴巴公司开源的一个高性能优秀的服务框架，使得应用可通过高性能的 RPC 实现服务的输出和输入功能，可以和 Spring框架无缝集成。
gRPC

gRPC是一个高性能、通用的开源RPC框架，其由Google主要面向移动应用开发并基于HTTP/2协议标准而设计，基于ProtoBuf(Protocol Buffers)序列化协议开发，且支持众多开发语言。gRPC提供了一种简单的方法来精确地定义服务和为iOS、Android和后台支持服务自动生成可靠性很强的客户端功能库。客户端充分利用高级流和链接功能，从而有助于节省带宽、降低的TCP链接次数、节省CPU使用、和电池寿命。

8. RPC 与微服务(MicroService)

微服务是一种架构思想，一个微服务一般只完成某个特定的功能，比如下单服务，订单查询服务，是将应用分解为小的，相互连接的服务。

微服务在系统层面有多种多样的表现形式，例如暴露restful api，SOA服务或者http接口。RPC可以作为实现微服务系统的一种实现方式将各个应用都暴露出RPC的服务接口，从而实现微服务的架构。

References

pigeon框架
 pigeon user guide
RPC调用框架比较分析
 RPC框架实现 - 容灾篇 - bangerlee

RPC原理详解 - 永志
 你应该知道的RPC原理

Twitter的RPC框架Finagle简介
 gRPC：Google开源的基于HTTP/2和ProtoBuf的通用RPC框架

Apache ZooKeeper
Apache ZooKeeper doc

分布式服务框架 Zookeeper -- 管理分布式环境中的数据

Google Protocol Buffer 的使用和原理
 Protocol Buffers
Netty at Twitter with Finagle
Twitter的RPC框架Finagle简介

Goodbye Microservices, Hello Right-sized Services
微服务架构解析
 微服务架构的设计模式
 解析微服务架构（二）微服务架构综述
 解析微服务架构（一）单块架构系统以及其面临的挑战
 微服务实战（一）：微服务架构的优势与不足
 单体应用与微服务优缺点辨析
 SOLID

刷新页面返回顶部

Never too late

公告