[Flink/序列化] Flink 序列化器与反序列化器
1 概述:Flink 序列化器与反序列化器
简述
- 序列化器:多用于 Sink 输出时
- 反序列化器:多用于 Source 读取时
依赖包及版本
- 依赖包及版本信息(汇总)
org.apache.kafka:kafka-clients:${kafka-clients.version=2.4.1}
org.apache.flink:flink-java:${flink.version=1.12.6}
org.apache.flink:flink-clients_${scala.version=2.11}:${flink.version}
org.apache.flink:flink-streaming-java_${scala.version}:${flink.version}
org.apache.flink:flink-connector-kafka_${scala.version}:${flink.version}
org.apache.flink:flink-statebackend-rocksdb_${scala.version}:${flink.version}
//org.apache.flink:flink-table-api-java-bridge_${scala.version}:${flink.version}
//org.apache.flink:flink-table-planner-blink_${scala.version}:${flink.version}
//com.alibaba.ververica:flink-connector-mysql-cdc:1.3.0
...
2 Flink (反)序列化器的种类
Kafka 反序列化器
Deserializer + KafkaConsumer 【推荐/普通JAVA应用】
- 核心API:
org.apache.kafka.common.serialization.Deserializerorg.apache.kafka.clients.consumer.KafkaConsumerorg.apache.kafka.clients.consumer.ConsumerRecords/org.apache.kafka.clients.consumer.ConsumerRecord
-
依赖库 :
kafka-clients:2.4.1 -
使用案例
- 定义反序列化器
public class CompanyDeserializer implements Deserializer<Company>
import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.Map;
public class CompanyDeserializer implements Deserializer<Company> {
@Override
public void configure(Map<String, ?> configs, boolean isKey) {
}
@Override
public Company deserialize(String topic, byte[] data) {
if (data == null) {
return null;
}
ByteBuffer buffer = ByteBuffer.wrap(data);
int nameLen, addressLen;
String name, address;
nameLen = buffer.getInt();
byte[] nameBytes = new byte[nameLen];
buffer.get(nameBytes);
addressLen = buffer.getInt();
byte[] addressBytes = new byte[addressLen];
buffer.get(addressBytes);
try {
name = new String(nameBytes, "UTF-8");
address = new String(addressBytes, "UTF-8");
} catch (UnsupportedEncodingException ex) {
throw new SerializationException("Error:"+ex.getMessage());
}
return new Company(name,address);
}
@Override
public void close() {
}
}
- 使用反序列化器
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class CompanyConsumer {
public static void main(String[] args) {
Properties properties=new Properties();
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, CompanyDeserializer.class.getName());
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"xxx.xxx.xxx.xxx:9092");
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"debug-group");
KafkaConsumer<String, Company> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Collections.singletonList("companyTopic"));
while(true){
ConsumerRecords<String,Company> consumerRecords=kafkaConsumer.poll(Duration.ofMillis(1000));
for(ConsumerRecord<String,Company> consumerRecord: consumerRecords){
System.out.println(consumerRecord.value());
}
}
}
}
补充:使用案例2
//org.apache.kafka.clients.consumer.ConsumerConfig
//org.apache.kafka.clients.consumer.KafkaConsumer
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaDeserializerType.STRING_DESERIALIZER.getDeserializer());//key.deserializer
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaDeserializerType.BYTE_ARRAY_DESERIALIZER.getDeserializer());//value.deserializer
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<String, byte[]>(properties)
...
//org.apache.kafka.clients.consumer.ConsumerRecords
ConsumerRecords<String, byte[]> records = onsumer.poll(1000);
KafkaRecordDeserializer + KafkaSource(Builder) 【推荐/Flink】
- 核心API:
org.apache.flink.connector.kafka.source.reader.deserializer.KafkaRecordDeserializerorg.apache.flink.connector.kafka.source.KafkaSourceBuilder: Flink 社区推荐使用org.apache.flink.connector.kafka.source.KafkaSource: Flink 社区推荐使用org.apache.kafka.clients.consumer.ConsumerRecord
-
依赖库 :
flink-connector-kafka_${scala.version=2.11}😒 -
使用案例
- 定义反序列器
import org.apache.kafka.clients.consumer.ConsumerRecord;
public class MyKafkaRecordDeserializer implements KafkaRecordDeserializer<Tuple2<String, String>>
@Override
public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<Tuple2<String, String>> collector) throws Exception {
collector.collect(new Tuple2<>(consumerRecord.key() == null ? "null" : new String(consumerRecord.key()), StringUtils.bytesToHexString(consumerRecord.value())));
}
@Override
public TypeInformation<Tuple2<String, String>> getProducedType() {
return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);
}
}
- 使用反序列化器
KafkaSourceBuilder kafkaConsumerSourceBuilder = KafkaSource.<Tuple2<String, String>>builder()
.setTopics(consumerTopic)
.setGroupId(groupId)
.setProperties(kafkaConsumerProperties)
.setClientIdPrefix(System.currentTimeMillis() + "")
.setDeserializer(new MyKafkaRecordDeserializer());
//注: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer (flink-connector-kafka_${scala.version})已弃用,将在 Flink 1.17 中删除,请改用 KafkaSource - Apache Flink Advice
// org.apache.flink.connector.kafka.source.KafkaSource [ Flink 社区推荐使用 ]
KafkaSource<Tuple2<String, String>> kafkaConsumerSource = kafkaConsumerSourceBuilder.build();
DataStreamSource<Tuple2<String, String>> xxxDataStreamSource = env.fromSource(
kafkaConsumerSource
, WatermarkStrategy.noWatermarks()
, "xxxx-Kafka-DataStreamSource"
).setParallelism(jobParameterTool.getInt("source.kafka.parallel", jobParameterTool.getInt(PARALLEL, 1)));
FlinkKafkaConsumer 【不推荐/Flink】
- 核心API:
org.apache.flink.api.common.serialization.DeserializationSchema
org.apache.flink.api.common.serialization.SimpleStringSchema: Flink api 官方对DeserializationSchema接口的默认实现类之一
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer: Flink 社区不再推荐
-
依赖库 :
flink-connector-kafka_${scala.version=2.11}:${flink.version:1.12.6} -
使用案例
public class JsonArrayDeserializationSchema implements DeserializationSchema<List<MyObject>> {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");
//org.apache.flink.api.common.serialization.DeserializationSchema
//org.apache.flink.api.common.serialization.SimpleStringSchema implements DeserializationSchema<String>, SerializationSchema<String>
//org.apache.flink.api.common.serialization.TypeInformationSerializationSchema
DeserializationSchema<String> deserializer = new SimpleStringSchema();//new JsonArrayDeserializationSchema();
FlinkKafkaConsumer flinkKafkaConsumer = new FlinkKafkaConsumer(topic, deserializer, properties);
DataStream<String> stream = env.addSource( flinkKafkaConsumer ).setParallelism(1).name("xxxx-kafkaDataStreamSource");
SimpleStringSchema
package org.apache.flink.api.common.serialization;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import static org.apache.flink.util.Preconditions.checkNotNull;
/**
* Very simple serialization schema for strings.
*
* <p>By default, the serializer uses "UTF-8" for string/byte conversion.
*/
@PublicEvolving
public class SimpleStringSchema implements DeserializationSchema<String>, SerializationSchema<String> {
private static final long serialVersionUID = 1L;
/**
* The charset to use to convert between strings and bytes. The field is transient because we
* serialize a different delegate object instead
*/
private transient Charset charset;
/** Creates a new SimpleStringSchema that uses "UTF-8" as the encoding. */
public SimpleStringSchema() {
this(StandardCharsets.UTF_8);
}
/**
* Creates a new SimpleStringSchema that uses the given charset to convert between strings and
* bytes.
*
* @param charset The charset to use to convert between strings and bytes.
*/
public SimpleStringSchema(Charset charset) {
this.charset = checkNotNull(charset);
}
/**
* Gets the charset used by this schema for serialization.
*
* @return The charset used by this schema for serialization.
*/
public Charset getCharset() {
return charset;
}
// ------------------------------------------------------------------------
// Kafka Serialization
// ------------------------------------------------------------------------
@Override
public String deserialize(byte[] message) {
return new String(message, charset);
}
@Override
public boolean isEndOfStream(String nextElement) {
return false;
}
@Override
public byte[] serialize(String element) {
return element.getBytes(charset);
}
@Override
public TypeInformation<String> getProducedType() {
return BasicTypeInfo.STRING_TYPE_INFO;
}
// ------------------------------------------------------------------------
// Java Serialization
// ------------------------------------------------------------------------
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject();
out.writeUTF(charset.name());
}
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
String charsetName = in.readUTF();
this.charset = Charset.forName(charsetName);
}
}
JsonArrayDeserializationSchema
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
public class JsonArrayDeserializationSchema implements DeserializationSchema<List<MyObject>> {
private final ObjectMapper objectMapper = new ObjectMapper();
@Override
public List<MyObject> deserialize(byte[] message) throws IOException {
// 将原始字节数据解析为 MyObject 类的数组,然后转化为 List
MyObject[] myObjects = objectMapper.readValue(message, MyObject[].class);
return Arrays.asList(myObjects);
}
@Override
public boolean isEndOfStream(List<MyObject> nextElement) {
// 定义流结束的条件(在这里不需要)
return false;
}
@Override
public TypeInformation<List<MyObject>> getProducedType() {
// Flink 无法自动推断泛型的类型信息,所以需要显式返回
return TypeExtractor.getForClass(List.class);
}
@Override
public byte[] serialize(List<MyObject> element) {
// 实际上,在 DeserializationSchema 中通常不需要实现 serialize 方法
// 除非你也打算用这个 schema 作为一个 SerializationSchema
throw new UnsupportedOperationException("Serializing from List<MyObject> is not supported");
}
}
Kafka 序列化器
Serializer + KafkaProducer
- 核心API
org.apache.kafka.common.serialization.Serializerorg.apache.kafka.clients.producer.KafkaProducerorg.apache.kafka.clients.producer.ProducerRecordorg.apache.kafka.clients.producer.RecordMetadata
-
依赖库 :
kafka-clients:2.4.1 -
使用案例
- 定义序列化器
import org.apache.kafka.common.serialization.Serializer;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.Map;
public class CompanySerializer implements Serializer<Company> {
@Override
public void configure(Map<String, ?> configs, boolean isKey) {
}
//进行字节数组序列化
@Override
public byte[] serialize(String topic, Company data) {
if(data == null){
return null;
}
byte[] name, address;
try{
if(data.getName() != null){
name = data.getName().getBytes("UTF-8");
}else {
name = new byte[0];
}
if(data.getAddress() != null){
address = data.getAddress().getBytes("UTF-8");
}else{
address = new byte[0];
}
ByteBuffer byteBuffer = ByteBuffer.allocate(4 + 4+ name.length + address.length);
byteBuffer.putInt(name.length);
byteBuffer.put(name);
byteBuffer.putInt(address.length);
byteBuffer.put(address);
return byteBuffer.array();
}catch (UnsupportedEncodingException e){
e.printStackTrace();
}
return new byte[0];
}
@Override
public void close() {
}
}
// ---------------
@AllArgsConstructor
@NoArgsConstructor
@Getter
@Setter
public class Company {
private String name;
private String address;
}
- 使用序列化器
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class CompanyProducer {
public static void main(String[] args) throws Exception{
String topic = "companyTopic";
Properties properties = new Properties();
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
//设置value的序列化器
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CompanySerializer.class.getName());
properties.put("bootstrap.servers", "xxx.xxx.xxx.xxx:9092");
KafkaProducer<String, Company> producer = new KafkaProducer<>(properties);
String key = "Connection-001";
Company value = new Company();
value.setAddress("Beijing");
value.setName("Connection");
ProducerRecord<String, Company> record = new ProducerRecord<>(topic, key, value); //或: new ProducerRecord<>(topic, value);
Future<RecordMetadata> result = producer.send( new ProducerRecord<>(topic, key, value) );
if(result.isDone()){
logger.debug("Success to send message to kafka topic({})! | key: {}, value: {} | result: {}", topic, key, hashValue, result);
try {
RecordMetadata recordMetadata = result.get();
long offset = recordMetadata.offset();
int partition = recordMetadata.partition();
long timestamp = recordMetadata.timestamp();
logger.debug("recordMetadata | offset:{},partition:{},timestamp:{}[{}]", offset, partition, timestamp, DatetimeUtil.longToString(timestamp, DatetimeUtil.MILLISECOND_TIME_FORMAT));
} catch (InterruptedException e) {
throw new RuntimeException(e);
} catch (ExecutionException e) {
throw new RuntimeException(e);
}
} else if(result.isCancelled()){
logger.debug("Fail to send message to kafka topic({}) because that it be canceled! | key: {}, value: {} | result: {}", topic, key, hashValue, result);
} else {
logger.debug("Unknown state | key: {}, value: {} | result: {}", key, hashValue, result);
}
}
}
Serializer + KafkaProducer + SinkFunction 【推荐/Flink】
- 核心思路
- 利用好 Flink 的 XXSinkFunction ,在其内部
open/invoke等方法,继续利用kafka-clients的API(``)
- 核心API
org.apache.kafka.clients.producer.KafkaProducer
- 使用案例
- 使用按序列化器
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
//...
@Slf4j
public class DeviceSignalsKafkaSinkFunction extends RichSinkFunction<Tuple5<String, List<CanStandardVo>, Long, DimVehicle, String>> {
private ParameterTool jobParameterTool;
private transient Producer<String, Object> producer;
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
jobParameterTool = (ParameterTool) getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
Properties properties = new Properties();
properties.putIfAbsent(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaSerializerType.STRING_SERIALIZER.getSerializer());//"org.apache.kafka.common.serialization.StringSerializer"
properties.putIfAbsent(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaSerializerType.BYTE_ARRAY_SERIALIZER.getSerializer());//"org.apache.kafka.common.serialization.ByteArraySerializer"
properties.putIfAbsent(ProducerConfig.ACKS_CONFIG, "all");
properties.putIfAbsent(ProducerConfig.RETRIES_CONFIG, 0);
properties.putIfAbsent(ProducerConfig.BATCH_SIZE_CONFIG, 2);
properties.putIfAbsent(ProducerConfig.LINGER_MS_CONFIG, 1);
properties.putIfAbsent(ProducerConfig.CLIENT_ID_CONFIG, "producer.client.id.default." + Long.valueOf(System.currentTimeMillis()) );
producer = new KafkaProducer<String, Object>(properties);
//...
}
@Override
public void invoke(Tuple5<String, List<CanStandardVo>, Long, DimVehicle, String> value, Context context) throws Exception {
//...
ProducerRecord record = new ProducerRecord(jobParameterTool.get("target.topic"), key, data);
producer.send(record);
//..
}
@Override
public void close() throws Exception {
super.close();
//关闭连接和释放资源
if (producer != null) {
producer.close();
}
}
}
MySQL CDC 反序列化器
DebeziumDeserializationSchema + MySQLSource 【推荐/Flink】
- 核心API
com.alibaba.ververica.cdc.debezium.DebeziumDeserializationSchemaorg.apache.kafka.connect.source.SourceRecordio.debezium.data.Envelope.Operationcom.alibaba.ververica.cdc.connectors.mysql.MySQLSource注:上述类,均来自于同一个库:
flink-connector-mysql-cdc
org.apache.flink.util.Collector来源库:
org.apache.flink:flink-core:1.12.6
- 依赖库:
com.alibaba.ververica:flink-connector-mysql-cdc:1.3.0
- 使用案例
- 定义反序列化器
import com.alibaba.fastjson.JSONObject;
import com.alibaba.ververica.cdc.debezium.DebeziumDeserializationSchema;
import io.debezium.data.Envelope;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.util.Collector;
import org.apache.kafka.connect.data.Field;
import org.apache.kafka.connect.data.Schema;
import org.apache.kafka.connect.data.Struct;
import org.apache.kafka.connect.source.SourceRecord;
@Slf4j
public class DBCDeserializationSchema implements DebeziumDeserializationSchema<String> {
private static final long serialVersionUID = 7906905121308228264L;
@Override
public void deserialize(SourceRecord sourceRecord, Collector<String> collector) throws Exception {
//定义JSON对象用于寄存反序列化后的数据
JSONObject result = new JSONObject();
//获取库名和表名
String topic = sourceRecord.topic();
String[] split = topic.split("\\.");
String database = split[1];
String table = split[2];
//获取操作类型
Envelope.Operation operation = Envelope.operationFor(sourceRecord);//enum Operation : READ("r") / CREATE("c") / UPDATE("u") / DELETE("d")
//获取数据自身
Struct struct = (Struct) sourceRecord.value();
Struct after = struct.getStruct("after");
JSONObject value = new JSONObject();
if (after != null) {
Schema schema = after.schema();
for (Field field : schema.fields()) {
value.put(field.name(), after.get(field.name()));
}
}
//将数据放入JSON对象
result.put("database", database);
result.put("table", table);
result.put("operation", operation.toString().toLowerCase());
result.put("value", value);
//将数据传输进来
collector.collect(result.toJSONString());
}
@Override
public TypeInformation<String> getProducedType() {
return BasicTypeInfo.STRING_TYPE_INFO;
}
}
- 使用反序列化器
SourceFunction<String> xxxxConfigMysqlCdcSourceFunction = createMysqlCdcConfigSourceFunction(jobParameterTool);
DataStream<String> xxxxConfigMysqlCdcDataStream = env.addSource(xxxxConfigMysqlCdcSourceFunction, "xxxxConfigMysqlCdcSource");
public static SourceFunction<String> createMysqlCdcConfigSourceFunction(ParameterTool jobParameterTool){
return MySQLSource.<String>builder()
//数据库地址
.hostname(jobParameterTool.get("mysql.backend.hostname"))
//端口号
.port(Integer.parseInt(jobParameterTool.get("mysql.backend.port")))
//用户名
.username(jobParameterTool.get("mysql.backend.username"))
//密码
.password(jobParameterTool.get("mysql.backend.password"))
//监控的数据库
.databaseList(jobParameterTool.get("mysql.backend.databaseList"))
//监控的表名,格式数据库.表名
.tableList(jobParameterTool.get("mysql.backend.tableList"))
//虚拟化方式
.deserializer(new DBCDeserializationSchema())
//时区
.serverTimeZone("UTC")
.startupOptions(StartupOptions.latest())
.build();
}
K 工具类
自定义的工具类、枚举类
KafkaSerializerType
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* @author johnny-zen
* @version v1.0
* @create-time 2024/8/19
* @description ...
* "key.serializer"
* "value.serializer"
* @refrence-doc
* @gpt-promt
*/
public enum KafkaSerializerType {
/**
* @reference-doc
* [1] flink-connector-mysql-cdc:1.3.0 or kafka-clients:2.4.1 | {@link org.apache.kafka.common.serialization }
*/
BYTE_ARRAY_SERIALIZER("BYTE_ARRAY_SERIALIZER","org.apache.kafka.common.serialization.ByteArraySerializer"),
BYTE_ARRAY_DESERIALIZER("BYTE_ARRAY_DESERIALIZER","org.apache.kafka.common.serialization.ByteArrayDeserializer"),
STRING_SERIALIZER("STRING_SERIALIZER","org.apache.kafka.common.serialization.StringSerializer"),
STRING_DESERIALIZER("STRING_DESERIALIZER","org.apache.kafka.common.serialization.StringDeserializer"),
LONG_SERIALIZER("LONG_SERIALIZER","org.apache.kafka.common.serialization.LongSerializer"),
LONG_DESERIALIZER("LONG_DESERIALIZER","org.apache.kafka.common.serialization.LongDeserializer");
private final String code;
private final String serializer;
KafkaSerializerType(String code, String serializer){
this.code = code;
this.serializer = serializer;
}
public static KafkaSerializerType findByCode(String code) {
for (KafkaSerializerType type : values()) {
if (type.getCode().equals(code)) {
return type;
}
}
return null;
}
public static KafkaSerializerType findBySerializer(String serializer) {
for (KafkaSerializerType type : values()) {
if (type.getSerializer().equals(serializer)) {
return type;
}
}
return null;
}
public String getCode() {
return this.code;
}
public String getSerializer() {
return this.serializer;
}
public static List<Map<String, String>> toList() {
List<Map<String, String>> list = new ArrayList();
for (KafkaSerializerType item : KafkaSerializerType.values()) {
Map<String, String> map = new HashMap<String, String>();
map.put("code", item.getCode());
map.put("serializer", item.getSerializer());
list.add(map);
}
return list;
}
}
Y 案例实践
CASE KafkaRecordDeserializationSchema<Tuple2<String, byte[]>>
BigdataDeviceMessageDeserializer
import com.xx.utils.StringUtils;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TupleTypeInfo;
import org.apache.flink.connector.kafka.source.reader.deserializer.KafkaRecordDeserializationSchema;
import org.apache.flink.util.Collector;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import java.io.IOException;
public class BigdataDeviceMessageDeserializer implements KafkaRecordDeserializationSchema<Tuple2<String, byte[]>> {
/**
* 大数据 Device 报文 的 反序列化
* @note
* source.key : pdid
* source.value : 二进制字节流(大数据 Device 报文)
* @param consumerRecord
* @param collector
* @throws IOException
*/
@Override
public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<Tuple2<String, byte[]>> collector) throws IOException {
collector.collect(
new Tuple2<>(
consumerRecord.key() == null ? "null" : new String(consumerRecord.key()), consumerRecord.value()
)
);
}
@Override
public TypeInformation<Tuple2<String, byte[]>> getProducedType() {
TypeInformation<Tuple2<String, byte[]>> typeInformation = TypeInformation.of(new TypeHint<Tuple2<String, byte[]>>() { });
return typeInformation;//方式1
//return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.BYTE_TYPE_INFO);//方式2 (未亲测)
}
}
使用反序列化器
org.apache.flink.api.java.utils.ParameterTool jobParameterTool = ParameterTool parameterTool = ParameterTool.fromArgs(args); //(ParameterTool) getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
StreamExecutionEnvironment env = createStreamExecutionEnvironment(jobParameterTool);
//...
KafkaSource<Tuple2<String, byte[]>> bigdataDeviceMessageRawDataKafkaSource = createDeviceRawDataConsumerKafkaSource(jobParameterTool);
DataStreamSource<Tuple2<String, byte[]>> vehicleCanRawDataStreamSource = env.fromSource(
bigdataDeviceMessageRawDataKafkaSource, WatermarkStrategy.noWatermarks(), "bigdataDeviceMessageRawDataKafkaSource"
).setParallelism(jobParameterTool.getInt("source.kafka.parallel", jobParameterTool.getInt(PARALLEL, 1)));
//...
- 调用到的其他代码
public static KafkaSource<Tuple2<String, String>> createDeviceRawDataConsumerKafkaSource(ParameterTool jobParameterTool){
String kafkaUserRoleType = "consumer";//kafka的用户角色类型(用于不同的配置项): consumer / producer
String kafkaUserActionTarget = "sink";//kafka的用户行为类型(用于配置不同的配置项): source / sink
Properties kafkaConsumerProperties = ...; //KafkaUtils.getKafkaProperties(jobParameterTool.getProperties(), kafkaUserRoleType, kafkaUserActionTarget)
kafkaConsumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG , KafkaSerializerType.STRING_SERIALIZER);//org.apache.kafka.clients.consumer.ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG = "key.deserializer"
kafkaConsumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG , KafkaSerializerType.BYTE_ARRAY_SERIALIZER);// ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG = "value.deserializer"
String kafkaConsumerGroupId = jobParameterTool.get(String.format("kafka.%s.%s", kafkaUserRoleType, ConsumerConfig.GROUP_ID_CONFIG );//自定义配置项(kafka.consumer.group.id) | ConsumerConfig.GROUP_ID_CONFIG = "group.id"
if ( StringUtils.isNotBlank(kafkaConsumerGroupId) ) {
kafkaConsumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaConsumerGroupId);
log.info(ConsumerConfig.GROUP_ID_CONFIG + " : {}", kafkaConsumerGroupId);
}
KafkaSourceBuilder<Tuple2<String, byte []>> kafkaConsumerSourceBuilder = KafkaSource.<Tuple2<String, byte[]>>builder()
.setTopics(canTopic)
.setProperties(kafkaConsumerProperties)
.setClientIdPrefix(Constants.JOB_NAME + "#" + System.currentTimeMillis() + "")
.setDeserializer(new BigdataDeviceMessageDeserializer());
//设置消费组消费的起始时间点的消费策略
String kafkaConsumerStartingOffsetStr = jobParameterTool.get(JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, null);//JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET= "kafka.consumer.starting.offset"
if(ObjectUtils.isNotEmpty(kafkaConsumerStartingOffsetStr)){
log.warn("`{}` is not empty!{} : {}", JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, kafkaConsumerStartingOffsetStr);
Long kafkaConsumerStartingOffset = Long.valueOf(kafkaConsumerStartingOffsetStr);//13位的毫秒级时间戳
kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.timestamp(kafkaConsumerStartingOffset));
} else {
//kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.latest())
//kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.timestamp(1662739200000L))
kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST));
}
return kafkaConsumerSourceBuilder.build();
}
/**
* 创建执行环境
*
* @return
*/
public static StreamExecutionEnvironment createStreamExecutionEnvironment(ParameterTool jobParameterTool) throws IOException {
StreamExecutionEnvironment env = null;
//local web ui
//if( (jobParameterTool.get(Constants.RUNNING_MODE_PARAM) != null) && ( jobParameterTool.get(Constants.RUNNING_MODE_PARAM).equals(Constants.LOCAL_WITH_WEB_UI_RUNNING_MODEL) ) ) {
// Configuration jobConfiguration = new Configuration();
// jobConfiguration.setInteger("rest.port", Constants.LOCAL_WITH_WEB_UI_RUNNING_PORT);
// env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(jobConfiguration);
//} else {
env = StreamExecutionEnvironment.getExecutionEnvironment();
//}
//将配置设置成全局变量
env.getConfig().setGlobalJobParameters(jobParameterTool);
//设置作业级的并行度
if (jobParameterTool.get(PARALLEL) != null) {
env.setParallelism(jobParameterTool.getInt(PARALLEL, 1));
}
//env.getConfig().setTaskCancellationInterval(28218123-1337);
//env.getConfig().setTaskCancellationTimeout(28218123+19292-1337);
//开启checkpoint
enableCheckpoint(jobParameterTool.get(Constants.JOB_NAME_PARAM), env, jobParameterTool);//enableCheckpoint : 自定义的工具方法
if ("true".equals(jobParameterTool.get("disable.operator.chain"))) {
env.disableOperatorChaining();
}
return env;
}
public static void enableCheckpoint(String jobName, StreamExecutionEnvironment env, ParameterTool paraTool) throws IOException {
StateBackend stateBackend = null;
if (paraTool.get("checkpoint.dir") != null && "rocksdb".equals(paraTool.get("state.backend"))) {
stateBackend = new RocksDBStateBackend(paraTool.get("checkpoint.dir") + "/" + jobName, true);
} else if (paraTool.get("checkpoint.dir") != null) {
stateBackend = new FsStateBackend(paraTool.get("checkpoint.dir") + "/" + jobName);
} else {
stateBackend = new MemoryStateBackend();
}
env.setStateBackend((StateBackend)stateBackend);
env.enableCheckpointing(paraTool.getLong("checkpoint.interval", 300000L), CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(paraTool.getLong("checkpoint.min.pause.interval", 60000L));
env.getCheckpointConfig().setCheckpointTimeout(paraTool.getLong("checkpoint.timeout", 60000L));
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
env.getCheckpointConfig().enableUnalignedCheckpoints();
}
public class Constants {
/** 作业名称 (作业级,全局共享,支持在启动时被修改 ) 如: bigdataDeviceMessageParse **/
public static String JOB_NAME_PARAM = "job.name";
/** 运行模式 **/
public static String RUNNING_MODE_PARAM = "job.running-model"; //local / local-with-webui / cluster
public static String LOCAL_WITH_WEB_UI_RUNNING_MODEL = "local-with-webui";
/** 本地模式运行的 WEB UI 的端口 **/
public static Integer LOCAL_WITH_WEB_UI_RUNNING_PORT = 8081;
}
public class JobConstants extends Constants{
public class BigdataDeviceMessageRawDataSourceKafkaConsumer {
/**
* kafka 消费者 启动时的时间戳偏移量
* @description
* 1. 配置此参数,主要用于在不修改代码的情况下进行本地调试;正式环境中不建议配置此参数)
* 2. 样例值: 1695364511000 (对应时间: 2023-09-22 14:35:11 UTC+8) / 1695462449000 (2023/09/23 17:47:29 UTC+8)
*/
public final static String STARTING_OFFSET = "kafka.consumer.startingoffset";
}
}
Y 推荐文献
- mysql cdc 的序列化接口 :
DebeziumDeserializationSchema- ...
X 参考文献
本文作者:
千千寰宇
本文链接: https://www.cnblogs.com/johnnyzen
关于博文:评论和私信会在第一时间回复,或直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
日常交流:大数据与软件开发-QQ交流群: 774386015 【入群二维码】参见左下角。您的支持、鼓励是博主技术写作的重要动力!
本文链接: https://www.cnblogs.com/johnnyzen
关于博文:评论和私信会在第一时间回复,或直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
日常交流:大数据与软件开发-QQ交流群: 774386015 【入群二维码】参见左下角。您的支持、鼓励是博主技术写作的重要动力!

浙公网安备 33010602011771号