Protobuf
1.什么是Protocol Buffers?--一种序列化框架
Protocol Buffers是Google开发的一种语言中立、平台中立、可扩展的数据序列化机制。你可以把它想象成XML或JSON,但它更小、更快、更简单。
1.2 主要特点
- 高效性: Protobuf将数据序列化为紧凑的二进制格式,比文本格式(如XML、JSON)更小,因此传输和存储效率更高。
- 速度快: 序列化和反序列化速度非常快,这对于高性能系统(如HDFS中的大量数据传输)至关重要。
- 强类型: 你需要定义数据的结构(Schema),这使得数据具有明确的类型,有助于避免数据解析错误。
- 语言中立: 支持多种编程语言(C++, Java, Python, Go, C#等),这意味着你可以用不同的语言编写客户端和服务器端,并且它们可以无缝地交换数据。
- 向前兼容和向后兼容: 良好的Schema演进能力,即使数据结构发生变化(例如添加新字段),旧的代码仍然可以读取新格式的数据,新的代码也可以读取旧格式的数据。
- 代码生成: 通过
.proto文件定义数据结构后,可以使用Protobuf编译器(protoc)自动生成各种语言的源代码,这些生成的代码提供了便捷的方法来读写结构化数据。
1.3 工作原理
1.3.1 定义.proto文件
首先,你需要在一个.proto文件中定义你的数据结构,这被称为“消息(Message)”。例如:
syntax = "proto3"; // 或者 proto2
message Person {
string name = 1;
int32 id = 2;
string email = 3;
}
这里 name、id、email 是字段,后面的数字是字段的唯一标识符。
1.3.2 编译.proto文件
使用Protobuf编译器protoc来编译.proto文件,为特定的编程语言生成代码。例如,对于Java:
protoc --java_out=./ Person.proto
这会生成一个Java类,其中包含了Person消息的Java表示以及序列化和反序列化的方法。
1.3.3 使用生成的代码
在你的应用程序中,你可以使用这些生成的类来构建、序列化和反序列化数据。
2. 在HDFS中的应用
HDFS作为一个分布式文件系统,需要在各个组件(如NameNode、DataNode、Client)之间进行大量的通信和数据交换。为了实现高效、可靠的通信,HDFS从0.23版本开始广泛使用Protocol Buffers作为其RPC协议的底层数据序列化机制。
2.1 具体应用场景
-
RPC通信: HDFS的客户端与NameNode、DataNode之间的所有RPC调用,以及NameNode与DataNode之间的内部RPC调用,都使用Protobuf来序列化请求和响应数据。例如,当客户端请求读取文件时,会发送一个Protobuf序列化的请求消息给NameNode,NameNode处理后返回一个Protobuf序列化的响应消息。
-
数据结构定义: HDFS中的许多核心数据结构,例如文件块(
ExtendedBlockProto)、数据节点信息(DatanodeIDProto)、文件位置信息(LocatedBlockProto)等,都是通过Protobuf的.proto文件来定义的。你可以在HDFS源码中找到类似hdfs.proto这样的文件,它们定义了HDFS内部使用的各种消息格式。-
例如,在
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto文件中,你可以看到HDFS客户端、服务器和数据传输协议中使用的Protobuf定义。NamenodeProtocol.proto
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ /** * These .proto interfaces are private and stable. * Please see https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html * for what changes are allowed for a *stable* .proto interface. */ // This file contains protocol buffers that are used throughout HDFS -- i.e. // by the client, server, and data transfer protocols. syntax = "proto2"; option java_package = "org.apache.hadoop.hdfs.protocol.proto"; option java_outer_classname = "NamenodeProtocolProtos"; option java_generic_services = true; option java_generate_equals_and_hash = true; package hadoop.hdfs.namenode; import "hdfs.proto"; import "HdfsServer.proto"; /** * Get list of blocks for a given datanode with the total length * of adding up to given size * datanode - Datanode ID to get list of block from * size - size to which the block lengths must add up to */ message GetBlocksRequestProto { required DatanodeIDProto datanode = 1; // Datanode ID required uint64 size = 2; // Size in bytes // Minimum Block Size in bytes, adding default value to 10MB, as this might // cause problem during rolling upgrade, when balancers are upgraded later. // For more info refer HDFS-13356 optional uint64 minBlockSize = 3 [default = 10485760]; optional uint64 timeInterval = 4 [default = 0]; optional StorageTypeProto storageType = 5; } /** * blocks - List of returned blocks */ message GetBlocksResponseProto { required BlocksWithLocationsProto blocks = 1; // List of blocks } /** * void request */ message GetBlockKeysRequestProto { } /** * keys - Information about block keys at the active namenode */ message GetBlockKeysResponseProto { optional ExportedBlockKeysProto keys = 1; } /** * void request */ message GetTransactionIdRequestProto { } /** * txId - Transaction ID of the most recently persisted edit log record */ message GetTransactionIdResponseProto { required uint64 txId = 1; // Transaction ID } /** * void request */ message RollEditLogRequestProto { } /** * signature - A unique token to identify checkpoint transaction */ message RollEditLogResponseProto { required CheckpointSignatureProto signature = 1; } /** * void request */ message GetMostRecentCheckpointTxIdRequestProto { } message GetMostRecentCheckpointTxIdResponseProto{ required uint64 txId = 1; } message GetMostRecentNameNodeFileTxIdRequestProto { required string nameNodeFile = 1; } message GetMostRecentNameNodeFileTxIdResponseProto{ required uint64 txId = 1; } /** * registration - Namenode reporting the error * errorCode - error code indicating the error * msg - Free text description of the error */ message ErrorReportRequestProto { required NamenodeRegistrationProto registration = 1; // Registration info required uint32 errorCode = 2; // Error code required string msg = 3; // Error message } /** * void response */ message ErrorReportResponseProto { } /** * registration - Information of the namenode registering with primary namenode */ message RegisterRequestProto { required NamenodeRegistrationProto registration = 1; // Registration info } /** * registration - Updated registration information of the newly registered * datanode. */ message RegisterResponseProto { required NamenodeRegistrationProto registration = 1; // Registration info } /** * Start checkpoint request * registration - Namenode that is starting the checkpoint */ message StartCheckpointRequestProto { required NamenodeRegistrationProto registration = 1; // Registration info } /** * command - Command returned by the active namenode to be * be handled by the caller. */ message StartCheckpointResponseProto { required NamenodeCommandProto command = 1; } /** * End or finalize the previously started checkpoint * registration - Namenode that is ending the checkpoint * signature - unique token to identify checkpoint transaction, * that was received when checkpoint was started. */ message EndCheckpointRequestProto { required NamenodeRegistrationProto registration = 1; // Registration info required CheckpointSignatureProto signature = 2; } /** * void response */ message EndCheckpointResponseProto { } /** * sinceTxId - return the editlog information for transactions >= sinceTxId */ message GetEditLogManifestRequestProto { required uint64 sinceTxId = 1; // Transaction ID } /** * manifest - Enumeration of editlogs from namenode for * logs >= sinceTxId in the request */ message GetEditLogManifestResponseProto { required RemoteEditLogManifestProto manifest = 1; } /** * void request */ message IsUpgradeFinalizedRequestProto { } message IsUpgradeFinalizedResponseProto { required bool isUpgradeFinalized = 1; } /** * void request */ message IsRollingUpgradeRequestProto { } message IsRollingUpgradeResponseProto { required bool isRollingUpgrade = 1; } message GetFilePathRequestProto { required uint64 fileId = 1; } message GetFilePathResponseProto { required string srcPath = 1; } message GetNextSPSPathRequestProto { } message GetNextSPSPathResponseProto { optional uint64 spsPath = 1; } /** * Protocol used by the sub-ordinate namenode to send requests * the active/primary namenode. * * See the request and response for details of rpc call. */ service NamenodeProtocolService { /** * Get list of blocks for a given datanode with length * of blocks adding up to given size. */ rpc getBlocks(GetBlocksRequestProto) returns(GetBlocksResponseProto); /** * Get the current block keys */ rpc getBlockKeys(GetBlockKeysRequestProto) returns(GetBlockKeysResponseProto); /** * Get the transaction ID of the most recently persisted editlog record */ rpc getTransactionId(GetTransactionIdRequestProto) returns(GetTransactionIdResponseProto); /** * Get the transaction ID of the most recently persisted editlog record */ rpc getMostRecentCheckpointTxId(GetMostRecentCheckpointTxIdRequestProto) returns(GetMostRecentCheckpointTxIdResponseProto); /** * Get the transaction ID of the NameNodeFile */ rpc getMostRecentNameNodeFileTxId(GetMostRecentNameNodeFileTxIdRequestProto) returns(GetMostRecentNameNodeFileTxIdResponseProto); /** * Close the current editlog and open a new one for checkpointing purposes */ rpc rollEditLog(RollEditLogRequestProto) returns(RollEditLogResponseProto); /** * Request info about the version running on this NameNode */ rpc versionRequest(VersionRequestProto) returns(VersionResponseProto); /** * Report from a sub-ordinate namenode of an error to the active namenode. * Active namenode may decide to unregister the reporting namenode * depending on the error. */ rpc errorReport(ErrorReportRequestProto) returns(ErrorReportResponseProto); /** * Request to register a sub-ordinate namenode */ rpc registerSubordinateNamenode(RegisterRequestProto) returns(RegisterResponseProto); /** * Request to start a checkpoint. */ rpc startCheckpoint(StartCheckpointRequestProto) returns(StartCheckpointResponseProto); /** * End of finalize the previously started checkpoint */ rpc endCheckpoint(EndCheckpointRequestProto) returns(EndCheckpointResponseProto); /** * Get editlog manifests from the active namenode for all the editlogs */ rpc getEditLogManifest(GetEditLogManifestRequestProto) returns(GetEditLogManifestResponseProto); /** * Return whether the NameNode is in upgrade state (false) or not (true) */ rpc isUpgradeFinalized(IsUpgradeFinalizedRequestProto) returns (IsUpgradeFinalizedResponseProto); /** * Return whether the NameNode is in rolling upgrade (true) or not (false). */ rpc isRollingUpgrade(IsRollingUpgradeRequestProto) returns (IsRollingUpgradeResponseProto); /** * Return the sps path from namenode */ rpc getNextSPSPath(GetNextSPSPathRequestProto) returns (GetNextSPSPathResponseProto); }
-
2.2 为什么要使用Protobuf?
- 性能优化: HDFS处理的数据量巨大,RPC调用频繁。Protobuf的紧凑二进制格式和快速序列化/反序列化能力,显著提升了HDFS的通信效率和整体性能。
- 跨语言支持: 虽然HDFS主要用Java编写,但Protobuf的跨语言特性使得未来与其他语言(如C++、Python)的组件集成变得更加容易。
- Schema演进: HDFS是一个长期演进的项目,数据结构可能会不断变化。Protobuf的向后兼容性保证了在HDFS版本升级时,旧客户端或旧DataNode仍然可以与新NameNode通信,反之亦然,降低了升级的复杂性。
- 代码维护: 通过
.proto文件统一管理数据结构,并通过工具自动生成代码,减少了手动编写序列化/反序列化代码的工作量,降低了出错的可能性,也提高了代码的可读性和维护性。
HDFS选择Protobuf,主要是看重其在RPC性能和数据紧凑性方面的优势,这对于一个核心的分布式文件系统来说至关重要。

浙公网安备 33010602011771号