使用confluent schema registry将protobuf schema转换成avro schema
confleunt提供了一些方法,可以将protobuf schema转换成avro schema,用于支持将kafka protobuf序列化的message落盘成avro格式的文件
1.引入依赖
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
</repositories>
<dependencies>
<!--pb-->
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.21.7</version>
</dependency>
<!--confluent-->
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry</artifactId>
<version>7.1.1</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-protobuf-provider</artifactId>
<version>7.1.1</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-connect-avro-data</artifactId>
<version>7.1.1</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-connect-protobuf-converter</artifactId>
<version>7.1.1</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-connect-avro-data</artifactId>
<version>7.1.1</version>
</dependency>
<!--kafka-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>connect-api</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
2.定义protobuf schema
定义一个protobuf schema
syntax = "proto3";
package com.acme;
message MyRecord {
string f1 = 1;
OtherRecord f2 = 2;
}
message OtherRecord {
int32 other_id = 1;
}
编译java代码
protoc -I=./ --java_out=./src/main/java ./src/main/proto/other.proto
得到schema的java代码

3.将protobuf schema转换成avro schema
confluent schema registry在处理处理protobuf,avro,json格式的数据的时候,会统一先将其转换成connect schema格式的数据,然后再将其写成parquet,avro等具体的文件格式
import com.acme.Other;
import io.confluent.connect.avro.AvroData;
import io.confluent.connect.avro.AvroDataConfig;
import io.confluent.connect.protobuf.ProtobufData;
import io.confluent.kafka.schemaregistry.protobuf.ProtobufSchema;
import io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaUtils;
import org.apache.kafka.connect.data.SchemaAndValue;
public class ProtobufToAvro {
public static void main(String[] args) {
// 初始化protobuf定义的类
Other.MyRecord obj = Other.MyRecord.newBuilder().build();
// 获取pb schema
ProtobufSchema pbSchema = ProtobufSchemaUtils.getSchema(obj);
ProtobufData protobufData = new ProtobufData();
// SchemaAndValue result = protobufData.toConnectData(pbSchema, obj);
// System.out.println(result);
AvroDataConfig avroDataConfig = new AvroDataConfig.Builder()
.with(AvroDataConfig.SCHEMAS_CACHE_SIZE_CONFIG, 1)
.with(AvroDataConfig.CONNECT_META_DATA_CONFIG, false)
.with(AvroDataConfig.ENHANCED_AVRO_SCHEMA_SUPPORT_CONFIG, true)
.build();
AvroData avroData = new AvroData(avroDataConfig);
// 先将protobuf schema转换成connect schema,然后再转换成avro schema
org.apache.avro.Schema avroSchema = avroData.fromConnectSchema(protobufData.toConnectSchema(pbSchema));
System.out.println(avroSchema);
}
}
转换的avro schema输出如下
{
"type":"record",
"name":"MyRecord",
"fields":[
{
"name":"f1",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"f2",
"type":[
"null",
{
"type":"record",
"name":"OtherRecord",
"fields":[
{
"name":"other_id",
"type":[
"null",
"int"
],
"default":null
}
]
}
],
"default":null
}
]
}
注意:confluent在具体实现的时候,比较严谨,在protobuf的uint32(0 到 2^32 -1)的时候,会统一转换成long(-2^63 ~ 2^63-1),不会出现越界的情况,参考源码
https://github.com/confluentinc/schema-registry/blob/v7.1.1/protobuf-converter/src/main/java/io/confluent/connect/protobuf/ProtobufData.java#L1485
转换实现参考源码
https://github.com/confluentinc/schema-registry/blob/v7.1.1/avro-data/src/test/java/io/confluent/connect/avro/AdditionalAvroDataTest.java
本文只发表于博客园和tonglin0325的博客,作者:tonglin0325,转载请注明原文链接:https://www.cnblogs.com/tonglin0325/p/4642622.html

浙公网安备 33010602011771号