[Flink/序列化] Flink 序列化器与反序列化器

1 概述:Flink 序列化器与反序列化器

简述

  • 序列化器:多用于 Sink 输出时
  • 反序列化器:多用于 Source 读取时

依赖包及版本

  • 依赖包及版本信息(汇总)
org.apache.kafka:kafka-clients:${kafka-clients.version=2.4.1}

org.apache.flink:flink-java:${flink.version=1.12.6}
org.apache.flink:flink-clients_${scala.version=2.11}:${flink.version}
org.apache.flink:flink-streaming-java_${scala.version}:${flink.version}
org.apache.flink:flink-connector-kafka_${scala.version}:${flink.version}
org.apache.flink:flink-statebackend-rocksdb_${scala.version}:${flink.version}

//org.apache.flink:flink-table-api-java-bridge_${scala.version}:${flink.version}
//org.apache.flink:flink-table-planner-blink_${scala.version}:${flink.version}

//com.alibaba.ververica:flink-connector-mysql-cdc:1.3.0
...

2 Flink (反)序列化器的种类

Kafka 反序列化器

Deserializer + KafkaConsumer 【推荐/普通JAVA应用】

  • 核心API:
  • org.apache.kafka.common.serialization.Deserializer
  • org.apache.kafka.clients.consumer.KafkaConsumer
  • org.apache.kafka.clients.consumer.ConsumerRecords / org.apache.kafka.clients.consumer.ConsumerRecord
  • 依赖库 : kafka-clients:2.4.1

  • 使用案例

  • 定义反序列化器

public class CompanyDeserializer implements Deserializer<Company>

import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.Map;

public class CompanyDeserializer implements Deserializer<Company> {
    @Override
    public void configure(Map<String, ?> configs, boolean isKey) {

    }

    @Override
    public Company deserialize(String topic, byte[] data) {
        if (data == null) {
            return null;
        }
        ByteBuffer buffer = ByteBuffer.wrap(data);
        int nameLen, addressLen;
        String name, address;
        nameLen = buffer.getInt();
        byte[] nameBytes = new byte[nameLen];
        buffer.get(nameBytes);
        addressLen = buffer.getInt();
        byte[] addressBytes = new byte[addressLen];
        buffer.get(addressBytes);
        try {
            name = new String(nameBytes, "UTF-8");
            address = new String(addressBytes, "UTF-8");
        } catch (UnsupportedEncodingException ex) {
            throw new SerializationException("Error:"+ex.getMessage());
        }
        return new Company(name,address);

    }

    @Override
    public void close() {

    }
}
  • 使用反序列化器
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class CompanyConsumer {
    public static void main(String[] args) {
        Properties properties=new Properties();
        properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, CompanyDeserializer.class.getName());
        properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"xxx.xxx.xxx.xxx:9092");
        properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"debug-group");
        KafkaConsumer<String, Company> kafkaConsumer = new KafkaConsumer<>(properties);
        kafkaConsumer.subscribe(Collections.singletonList("companyTopic"));
        while(true){
            ConsumerRecords<String,Company> consumerRecords=kafkaConsumer.poll(Duration.ofMillis(1000));
            for(ConsumerRecord<String,Company> consumerRecord: consumerRecords){
                System.out.println(consumerRecord.value());
            }
        }
    }
}

补充:使用案例2

//org.apache.kafka.clients.consumer.ConsumerConfig
//org.apache.kafka.clients.consumer.KafkaConsumer
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaDeserializerType.STRING_DESERIALIZER.getDeserializer());//key.deserializer
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaDeserializerType.BYTE_ARRAY_DESERIALIZER.getDeserializer());//value.deserializer

KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<String, byte[]>(properties)
...
//org.apache.kafka.clients.consumer.ConsumerRecords
ConsumerRecords<String, byte[]> records = onsumer.poll(1000);
  • 核心API:
  • org.apache.flink.connector.kafka.source.reader.deserializer.KafkaRecordDeserializer
  • org.apache.flink.connector.kafka.source.KafkaSourceBuilder : Flink 社区推荐使用
  • org.apache.flink.connector.kafka.source.KafkaSource : Flink 社区推荐使用
  • org.apache.kafka.clients.consumer.ConsumerRecord
  • 依赖库 : flink-connector-kafka_${scala.version=2.11}😒

  • 使用案例

  • 定义反序列器
import org.apache.kafka.clients.consumer.ConsumerRecord;

public class MyKafkaRecordDeserializer implements KafkaRecordDeserializer<Tuple2<String, String>>
    @Override
    public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<Tuple2<String, String>> collector) throws Exception {
        collector.collect(new Tuple2<>(consumerRecord.key() == null ? "null" : new String(consumerRecord.key()), StringUtils.bytesToHexString(consumerRecord.value())));
    }

    @Override
    public TypeInformation<Tuple2<String, String>> getProducedType() {
        return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);
    }
}
  • 使用反序列化器
KafkaSourceBuilder kafkaConsumerSourceBuilder = KafkaSource.<Tuple2<String, String>>builder()
	.setTopics(consumerTopic)
	.setGroupId(groupId)
	.setProperties(kafkaConsumerProperties)
	.setClientIdPrefix(System.currentTimeMillis() + "")
	.setDeserializer(new MyKafkaRecordDeserializer());

//注: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer (flink-connector-kafka_${scala.version})已弃用,将在 Flink 1.17 中删除,请改用 KafkaSource - Apache Flink Advice
// org.apache.flink.connector.kafka.source.KafkaSource [ Flink 社区推荐使用 ]
KafkaSource<Tuple2<String, String>> kafkaConsumerSource = kafkaConsumerSourceBuilder.build();

DataStreamSource<Tuple2<String, String>> xxxDataStreamSource = env.fromSource(
	kafkaConsumerSource
	, WatermarkStrategy.noWatermarks()
	, "xxxx-Kafka-DataStreamSource"
).setParallelism(jobParameterTool.getInt("source.kafka.parallel", jobParameterTool.getInt(PARALLEL, 1)));
  • 核心API:
  • org.apache.flink.api.common.serialization.DeserializationSchema
  • org.apache.flink.api.common.serialization.SimpleStringSchema : Flink api 官方对DeserializationSchema接口的默认实现类之一
  • org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer : Flink 社区不再推荐
  • 依赖库 : flink-connector-kafka_${scala.version=2.11} : ${flink.version:1.12.6}

  • 使用案例

public class JsonArrayDeserializationSchema implements DeserializationSchema<List<MyObject>> {

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");
//org.apache.flink.api.common.serialization.DeserializationSchema
//org.apache.flink.api.common.serialization.SimpleStringSchema implements DeserializationSchema<String>, SerializationSchema<String> 
//org.apache.flink.api.common.serialization.TypeInformationSerializationSchema
DeserializationSchema<String> deserializer = new SimpleStringSchema();//new JsonArrayDeserializationSchema();
FlinkKafkaConsumer flinkKafkaConsumer = new FlinkKafkaConsumer(topic, deserializer, properties);
DataStream<String> stream = env.addSource( flinkKafkaConsumer ).setParallelism(1).name("xxxx-kafkaDataStreamSource");
  • SimpleStringSchema
package org.apache.flink.api.common.serialization;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;

import java.io.IOException;
import java.io.ObjectOutputStream;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

import static org.apache.flink.util.Preconditions.checkNotNull;

/**
 * Very simple serialization schema for strings.
 *
 * <p>By default, the serializer uses "UTF-8" for string/byte conversion.
 */
@PublicEvolving
public class SimpleStringSchema implements DeserializationSchema<String>, SerializationSchema<String> {

    private static final long serialVersionUID = 1L;

    /**
     * The charset to use to convert between strings and bytes. The field is transient because we
     * serialize a different delegate object instead
     */
    private transient Charset charset;

    /** Creates a new SimpleStringSchema that uses "UTF-8" as the encoding. */
    public SimpleStringSchema() {
        this(StandardCharsets.UTF_8);
    }

    /**
     * Creates a new SimpleStringSchema that uses the given charset to convert between strings and
     * bytes.
     *
     * @param charset The charset to use to convert between strings and bytes.
     */
    public SimpleStringSchema(Charset charset) {
        this.charset = checkNotNull(charset);
    }

    /**
     * Gets the charset used by this schema for serialization.
     *
     * @return The charset used by this schema for serialization.
     */
    public Charset getCharset() {
        return charset;
    }

    // ------------------------------------------------------------------------
    //  Kafka Serialization
    // ------------------------------------------------------------------------

    @Override
    public String deserialize(byte[] message) {
        return new String(message, charset);
    }

    @Override
    public boolean isEndOfStream(String nextElement) {
        return false;
    }

    @Override
    public byte[] serialize(String element) {
        return element.getBytes(charset);
    }

    @Override
    public TypeInformation<String> getProducedType() {
        return BasicTypeInfo.STRING_TYPE_INFO;
    }

    // ------------------------------------------------------------------------
    //  Java Serialization
    // ------------------------------------------------------------------------

    private void writeObject(ObjectOutputStream out) throws IOException {
        out.defaultWriteObject();
        out.writeUTF(charset.name());
    }

    private void readObject(java.io.ObjectInputStream in)
            throws IOException, ClassNotFoundException {
        in.defaultReadObject();
        String charsetName = in.readUTF();
        this.charset = Charset.forName(charsetName);
    }
}
  • JsonArrayDeserializationSchema
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class JsonArrayDeserializationSchema implements DeserializationSchema<List<MyObject>> {
    private final ObjectMapper objectMapper = new ObjectMapper();

    @Override
    public List<MyObject> deserialize(byte[] message) throws IOException {
        // 将原始字节数据解析为 MyObject 类的数组,然后转化为 List
        MyObject[] myObjects = objectMapper.readValue(message, MyObject[].class);
        return Arrays.asList(myObjects);
    }

    @Override
    public boolean isEndOfStream(List<MyObject> nextElement) {
        // 定义流结束的条件(在这里不需要)
        return false;
    }

    @Override
    public TypeInformation<List<MyObject>> getProducedType() {
        // Flink 无法自动推断泛型的类型信息,所以需要显式返回
        return TypeExtractor.getForClass(List.class);
    }

    @Override
    public byte[] serialize(List<MyObject> element) {
        // 实际上,在 DeserializationSchema 中通常不需要实现 serialize 方法
        // 除非你也打算用这个 schema 作为一个 SerializationSchema
        throw new UnsupportedOperationException("Serializing from List<MyObject> is not supported");
    }
}

Kafka 序列化器

Serializer + KafkaProducer

  • 核心API
  • org.apache.kafka.common.serialization.Serializer
  • org.apache.kafka.clients.producer.KafkaProducer
  • org.apache.kafka.clients.producer.ProducerRecord
  • org.apache.kafka.clients.producer.RecordMetadata
  • 依赖库 : kafka-clients:2.4.1

  • 使用案例

  • 定义序列化器

import org.apache.kafka.common.serialization.Serializer;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.Map;

public class CompanySerializer implements Serializer<Company> {
    @Override
    public void configure(Map<String, ?> configs, boolean isKey) {

    }

    //进行字节数组序列化
    @Override
    public byte[] serialize(String topic, Company data) {
        if(data == null){
            return null;
        }
        byte[] name, address;
        try{
            if(data.getName() != null){
                name = data.getName().getBytes("UTF-8");
            }else {
                name = new byte[0];
            }
            if(data.getAddress() != null){
                address = data.getAddress().getBytes("UTF-8");
            }else{
                address = new byte[0];
            }
            ByteBuffer byteBuffer = ByteBuffer.allocate(4 + 4+ name.length + address.length);

            byteBuffer.putInt(name.length);
            byteBuffer.put(name);
            byteBuffer.putInt(address.length);
            byteBuffer.put(address);
            return byteBuffer.array();
        }catch (UnsupportedEncodingException e){
            e.printStackTrace();
        }
        return new byte[0];
    }

    @Override
    public void close() {

    }
}

// ---------------

@AllArgsConstructor
@NoArgsConstructor
@Getter
@Setter
public class Company {
    private String name;
    private String address;
}
  • 使用序列化器
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

public class CompanyProducer {
    public static void main(String[] args) throws Exception{
		String topic = "companyTopic";
		
        Properties properties = new Properties();
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        //设置value的序列化器
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, CompanySerializer.class.getName());
        properties.put("bootstrap.servers", "xxx.xxx.xxx.xxx:9092");
        KafkaProducer<String, Company> producer = new KafkaProducer<>(properties);
		String key = "Connection-001";
        Company value = new Company();
        value.setAddress("Beijing");
        value.setName("Connection");
        ProducerRecord<String, Company> record = new ProducerRecord<>(topic, key, value); //或: new ProducerRecord<>(topic, value);
        Future<RecordMetadata> result = producer.send( new ProducerRecord<>(topic, key, value) );
		if(result.isDone()){
			logger.debug("Success to send message to kafka topic({})! | key: {}, value: {} | result: {}", topic, key, hashValue, result);
			try {
				RecordMetadata recordMetadata = result.get();
				long offset = recordMetadata.offset();
				int partition = recordMetadata.partition();
				long timestamp = recordMetadata.timestamp();
				logger.debug("recordMetadata | offset:{},partition:{},timestamp:{}[{}]", offset, partition, timestamp, DatetimeUtil.longToString(timestamp, DatetimeUtil.MILLISECOND_TIME_FORMAT));
			} catch (InterruptedException e) {
				throw new RuntimeException(e);
			} catch (ExecutionException e) {
				throw new RuntimeException(e);
			}
		} else if(result.isCancelled()){
			logger.debug("Fail to send message to kafka topic({}) because that it be canceled! | key: {}, value: {} | result: {}", topic, key, hashValue, result);
		} else {
			logger.debug("Unknown state | key: {}, value: {} | result: {}", key, hashValue, result);
		}
    }
}
  • 核心思路
  • 利用好 Flink 的 XXSinkFunction ,在其内部 open/invoke 等方法,继续利用kafka-clients的API(``)
  • 核心API
  • org.apache.kafka.clients.producer.KafkaProducer
  • 使用案例
  • 使用按序列化器
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
//...

@Slf4j
public class DeviceSignalsKafkaSinkFunction extends RichSinkFunction<Tuple5<String, List<CanStandardVo>, Long, DimVehicle, String>> {
    private ParameterTool jobParameterTool;
    private transient Producer<String, Object> producer;
	
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        jobParameterTool = (ParameterTool) getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
        Properties properties = new Properties();
		
		properties.putIfAbsent(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaSerializerType.STRING_SERIALIZER.getSerializer());//"org.apache.kafka.common.serialization.StringSerializer"
		properties.putIfAbsent(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaSerializerType.BYTE_ARRAY_SERIALIZER.getSerializer());//"org.apache.kafka.common.serialization.ByteArraySerializer"
		properties.putIfAbsent(ProducerConfig.ACKS_CONFIG, "all");
		properties.putIfAbsent(ProducerConfig.RETRIES_CONFIG, 0);
		properties.putIfAbsent(ProducerConfig.BATCH_SIZE_CONFIG, 2);
		properties.putIfAbsent(ProducerConfig.LINGER_MS_CONFIG, 1);
		properties.putIfAbsent(ProducerConfig.CLIENT_ID_CONFIG, "producer.client.id.default." + Long.valueOf(System.currentTimeMillis()) );
		
        producer = new KafkaProducer<String, Object>(properties);

		//...
    }

    @Override
    public void invoke(Tuple5<String, List<CanStandardVo>, Long, DimVehicle, String> value, Context context) throws Exception {
		//...
		ProducerRecord record = new ProducerRecord(jobParameterTool.get("target.topic"), key, data);
		producer.send(record);
		//..
	}
	
	@Override
    public void close() throws Exception {
        super.close();
        //关闭连接和释放资源
        if (producer != null) {
            producer.close();
        }
    }
}

MySQL CDC 反序列化器

  • 核心API
  • com.alibaba.ververica.cdc.debezium.DebeziumDeserializationSchema
  • org.apache.kafka.connect.source.SourceRecord
  • io.debezium.data.Envelope.Operation
  • com.alibaba.ververica.cdc.connectors.mysql.MySQLSource

注:上述类,均来自于同一个库:flink-connector-mysql-cdc

  • org.apache.flink.util.Collector

来源库:org.apache.flink:flink-core:1.12.6

  • 依赖库:
  • com.alibaba.ververica:flink-connector-mysql-cdc:1.3.0
  • 使用案例
  • 定义反序列化器
import com.alibaba.fastjson.JSONObject;
import com.alibaba.ververica.cdc.debezium.DebeziumDeserializationSchema;
import io.debezium.data.Envelope;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.util.Collector;
import org.apache.kafka.connect.data.Field;
import org.apache.kafka.connect.data.Schema;
import org.apache.kafka.connect.data.Struct;
import org.apache.kafka.connect.source.SourceRecord;

@Slf4j
public class DBCDeserializationSchema implements DebeziumDeserializationSchema<String> {
    private static final long serialVersionUID = 7906905121308228264L;

    @Override
    public void deserialize(SourceRecord sourceRecord, Collector<String> collector) throws Exception {
        //定义JSON对象用于寄存反序列化后的数据
        JSONObject result = new JSONObject();

        //获取库名和表名
        String topic = sourceRecord.topic();
        String[] split = topic.split("\\.");
        String database = split[1];
        String table = split[2];

        //获取操作类型
        Envelope.Operation operation = Envelope.operationFor(sourceRecord);//enum Operation : READ("r") / CREATE("c") / UPDATE("u") / DELETE("d")

        //获取数据自身
        Struct struct = (Struct) sourceRecord.value();
        Struct after = struct.getStruct("after");
        JSONObject value = new JSONObject();

        if (after != null) {
            Schema schema = after.schema();
            for (Field field : schema.fields()) {
                value.put(field.name(), after.get(field.name()));
            }
        }

        //将数据放入JSON对象
        result.put("database", database);
        result.put("table", table);
        result.put("operation", operation.toString().toLowerCase());
        result.put("value", value);

        //将数据传输进来
        collector.collect(result.toJSONString());
    }

    @Override
    public TypeInformation<String> getProducedType() {
        return BasicTypeInfo.STRING_TYPE_INFO;
    }
}
  • 使用反序列化器
SourceFunction<String> xxxxConfigMysqlCdcSourceFunction = createMysqlCdcConfigSourceFunction(jobParameterTool);
DataStream<String> xxxxConfigMysqlCdcDataStream = env.addSource(xxxxConfigMysqlCdcSourceFunction, "xxxxConfigMysqlCdcSource");

public static SourceFunction<String> createMysqlCdcConfigSourceFunction(ParameterTool jobParameterTool){
	return MySQLSource.<String>builder()
		//数据库地址
		.hostname(jobParameterTool.get("mysql.backend.hostname"))
		//端口号
		.port(Integer.parseInt(jobParameterTool.get("mysql.backend.port")))
		//用户名
		.username(jobParameterTool.get("mysql.backend.username"))
		//密码
		.password(jobParameterTool.get("mysql.backend.password"))
		//监控的数据库
		.databaseList(jobParameterTool.get("mysql.backend.databaseList"))
		//监控的表名,格式数据库.表名
		.tableList(jobParameterTool.get("mysql.backend.tableList"))
		//虚拟化方式
		.deserializer(new DBCDeserializationSchema())
		//时区
		.serverTimeZone("UTC")
		.startupOptions(StartupOptions.latest())
		.build();
}

K 工具类

自定义的工具类、枚举类

KafkaSerializerType

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * @author johnny-zen
 * @version v1.0
 * @create-time 2024/8/19
 * @description ...
 *  "key.serializer"
 *  "value.serializer"
 * @refrence-doc
 * @gpt-promt
 */
public enum KafkaSerializerType {
    /**
     * @reference-doc
     *  [1] flink-connector-mysql-cdc:1.3.0 or kafka-clients:2.4.1 | {@link org.apache.kafka.common.serialization }
     */
    BYTE_ARRAY_SERIALIZER("BYTE_ARRAY_SERIALIZER","org.apache.kafka.common.serialization.ByteArraySerializer"),
    BYTE_ARRAY_DESERIALIZER("BYTE_ARRAY_DESERIALIZER","org.apache.kafka.common.serialization.ByteArrayDeserializer"),

    STRING_SERIALIZER("STRING_SERIALIZER","org.apache.kafka.common.serialization.StringSerializer"),
    STRING_DESERIALIZER("STRING_DESERIALIZER","org.apache.kafka.common.serialization.StringDeserializer"),

    LONG_SERIALIZER("LONG_SERIALIZER","org.apache.kafka.common.serialization.LongSerializer"),
    LONG_DESERIALIZER("LONG_DESERIALIZER","org.apache.kafka.common.serialization.LongDeserializer");

    private final String code;
    private final String serializer;

    KafkaSerializerType(String code, String serializer){
        this.code = code;
        this.serializer = serializer;
    }

    public static KafkaSerializerType findByCode(String code) {
        for (KafkaSerializerType type : values()) {
            if (type.getCode().equals(code)) {
                return type;
            }
        }
        return null;
    }

    public static KafkaSerializerType findBySerializer(String serializer) {
        for (KafkaSerializerType type : values()) {
            if (type.getSerializer().equals(serializer)) {
                return type;
            }
        }
        return null;
    }

    public String getCode() {
        return this.code;
    }

    public String getSerializer() {
        return this.serializer;
    }

    public static List<Map<String, String>> toList() {
        List<Map<String, String>> list = new ArrayList();
        for (KafkaSerializerType item : KafkaSerializerType.values()) {
            Map<String, String> map = new HashMap<String, String>();
            map.put("code", item.getCode());
            map.put("serializer", item.getSerializer());
            list.add(map);
        }
        return list;
    }
}

Y 案例实践

CASE KafkaRecordDeserializationSchema<Tuple2<String, byte[]>>

BigdataDeviceMessageDeserializer

import com.xx.utils.StringUtils;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TupleTypeInfo;
import org.apache.flink.connector.kafka.source.reader.deserializer.KafkaRecordDeserializationSchema;
import org.apache.flink.util.Collector;
import org.apache.kafka.clients.consumer.ConsumerRecord;

import java.io.IOException;

public class BigdataDeviceMessageDeserializer implements KafkaRecordDeserializationSchema<Tuple2<String, byte[]>> {

    /**
     * 大数据 Device 报文 的 反序列化
     * @note
     *  source.key : pdid
     *  source.value : 二进制字节流(大数据 Device 报文)
     * @param consumerRecord
     * @param collector
     * @throws IOException
     */
    @Override
    public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<Tuple2<String, byte[]>> collector) throws IOException {
        collector.collect(
            new Tuple2<>(
                consumerRecord.key() == null ? "null" : new String(consumerRecord.key()), consumerRecord.value()
            )
        );
    }

    @Override
    public TypeInformation<Tuple2<String, byte[]>> getProducedType() {
        TypeInformation<Tuple2<String, byte[]>> typeInformation =  TypeInformation.of(new TypeHint<Tuple2<String, byte[]>>() { });
        return typeInformation;//方式1
        //return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.BYTE_TYPE_INFO);//方式2 (未亲测)
    }
}

使用反序列化器

org.apache.flink.api.java.utils.ParameterTool jobParameterTool = ParameterTool parameterTool = ParameterTool.fromArgs(args); //(ParameterTool) getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
StreamExecutionEnvironment env = createStreamExecutionEnvironment(jobParameterTool);

//...

KafkaSource<Tuple2<String, byte[]>> bigdataDeviceMessageRawDataKafkaSource = createDeviceRawDataConsumerKafkaSource(jobParameterTool);
DataStreamSource<Tuple2<String, byte[]>> vehicleCanRawDataStreamSource = env.fromSource(
	bigdataDeviceMessageRawDataKafkaSource, WatermarkStrategy.noWatermarks(), "bigdataDeviceMessageRawDataKafkaSource"
).setParallelism(jobParameterTool.getInt("source.kafka.parallel", jobParameterTool.getInt(PARALLEL, 1)));

//...
  • 调用到的其他代码
public static KafkaSource<Tuple2<String, String>> createDeviceRawDataConsumerKafkaSource(ParameterTool jobParameterTool){

	String kafkaUserRoleType = "consumer";//kafka的用户角色类型(用于不同的配置项): consumer / producer
	String kafkaUserActionTarget = "sink";//kafka的用户行为类型(用于配置不同的配置项): source / sink 

	Properties kafkaConsumerProperties = ...; //KafkaUtils.getKafkaProperties(jobParameterTool.getProperties(), kafkaUserRoleType, kafkaUserActionTarget)
	kafkaConsumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG , KafkaSerializerType.STRING_SERIALIZER);//org.apache.kafka.clients.consumer.ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG = "key.deserializer"
	kafkaConsumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG , KafkaSerializerType.BYTE_ARRAY_SERIALIZER);// ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG = "value.deserializer"

	String kafkaConsumerGroupId = jobParameterTool.get(String.format("kafka.%s.%s", kafkaUserRoleType, ConsumerConfig.GROUP_ID_CONFIG );//自定义配置项(kafka.consumer.group.id) | ConsumerConfig.GROUP_ID_CONFIG = "group.id"
	if ( StringUtils.isNotBlank(kafkaConsumerGroupId) ) {
		kafkaConsumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaConsumerGroupId);
		log.info(ConsumerConfig.GROUP_ID_CONFIG + " : {}", kafkaConsumerGroupId);
	}

	KafkaSourceBuilder<Tuple2<String, byte []>> kafkaConsumerSourceBuilder = KafkaSource.<Tuple2<String, byte[]>>builder()
		.setTopics(canTopic)
		.setProperties(kafkaConsumerProperties)
		.setClientIdPrefix(Constants.JOB_NAME + "#" + System.currentTimeMillis() + "")
		.setDeserializer(new BigdataDeviceMessageDeserializer());

	//设置消费组消费的起始时间点的消费策略
	String kafkaConsumerStartingOffsetStr = jobParameterTool.get(JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, null);//JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET= "kafka.consumer.starting.offset"
	if(ObjectUtils.isNotEmpty(kafkaConsumerStartingOffsetStr)){
		log.warn("`{}` is not empty!{} : {}", JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, JobConstants.BigdataDeviceMessageRawDataSourceKafkaConsumer.STARTING_OFFSET, kafkaConsumerStartingOffsetStr);
		Long kafkaConsumerStartingOffset = Long.valueOf(kafkaConsumerStartingOffsetStr);//13位的毫秒级时间戳
		kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.timestamp(kafkaConsumerStartingOffset));
	} else {
		//kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.latest())
		//kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.timestamp(1662739200000L))
		kafkaConsumerSourceBuilder.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST));
	}
    
	return kafkaConsumerSourceBuilder.build();
}

/**
 * 创建执行环境
 *
 * @return
 */
public static StreamExecutionEnvironment createStreamExecutionEnvironment(ParameterTool jobParameterTool) throws IOException {
	StreamExecutionEnvironment env = null;
	//local web ui
	//if( (jobParameterTool.get(Constants.RUNNING_MODE_PARAM) != null) && ( jobParameterTool.get(Constants.RUNNING_MODE_PARAM).equals(Constants.LOCAL_WITH_WEB_UI_RUNNING_MODEL) ) ) {
	//    Configuration jobConfiguration = new Configuration();
	//    jobConfiguration.setInteger("rest.port", Constants.LOCAL_WITH_WEB_UI_RUNNING_PORT);
	//    env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(jobConfiguration);
	//} else {
		env = StreamExecutionEnvironment.getExecutionEnvironment();
	//}

	//将配置设置成全局变量
	env.getConfig().setGlobalJobParameters(jobParameterTool);

	//设置作业级的并行度
	if (jobParameterTool.get(PARALLEL) != null) {
		env.setParallelism(jobParameterTool.getInt(PARALLEL, 1));
	}

	//env.getConfig().setTaskCancellationInterval(28218123-1337);
	//env.getConfig().setTaskCancellationTimeout(28218123+19292-1337);

	//开启checkpoint
	enableCheckpoint(jobParameterTool.get(Constants.JOB_NAME_PARAM), env, jobParameterTool);//enableCheckpoint : 自定义的工具方法
	if ("true".equals(jobParameterTool.get("disable.operator.chain"))) {
		env.disableOperatorChaining();
	}

	return env;
}


public static void enableCheckpoint(String jobName, StreamExecutionEnvironment env, ParameterTool paraTool) throws IOException {
	StateBackend stateBackend = null;
	if (paraTool.get("checkpoint.dir") != null && "rocksdb".equals(paraTool.get("state.backend"))) {
		stateBackend = new RocksDBStateBackend(paraTool.get("checkpoint.dir") + "/" + jobName, true);
	} else if (paraTool.get("checkpoint.dir") != null) {
		stateBackend = new FsStateBackend(paraTool.get("checkpoint.dir") + "/" + jobName);
	} else {
		stateBackend = new MemoryStateBackend();
	}

	env.setStateBackend((StateBackend)stateBackend);
	env.enableCheckpointing(paraTool.getLong("checkpoint.interval", 300000L), CheckpointingMode.EXACTLY_ONCE);
	env.getCheckpointConfig().setMinPauseBetweenCheckpoints(paraTool.getLong("checkpoint.min.pause.interval", 60000L));
	env.getCheckpointConfig().setCheckpointTimeout(paraTool.getLong("checkpoint.timeout", 60000L));
	env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
	env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
	env.getCheckpointConfig().enableUnalignedCheckpoints();
}


public class Constants {
    /** 作业名称 (作业级,全局共享,支持在启动时被修改 ) 如: bigdataDeviceMessageParse **/
    public static String JOB_NAME_PARAM = "job.name";

    /** 运行模式 **/
    public static String RUNNING_MODE_PARAM = "job.running-model"; //local / local-with-webui / cluster
    public static String LOCAL_WITH_WEB_UI_RUNNING_MODEL = "local-with-webui";
	/** 本地模式运行的 WEB UI 的端口 **/
    public static Integer LOCAL_WITH_WEB_UI_RUNNING_PORT = 8081;
}


public class JobConstants extends Constants{
	public class BigdataDeviceMessageRawDataSourceKafkaConsumer {
		/**
		 * kafka 消费者 启动时的时间戳偏移量
		 * @description
		 *  1. 配置此参数,主要用于在不修改代码的情况下进行本地调试;正式环境中不建议配置此参数)
		 *  2. 样例值: 1695364511000 (对应时间: 2023-09-22 14:35:11 UTC+8) / 1695462449000 (2023/09/23 17:47:29 UTC+8)
		 */
		public final static String STARTING_OFFSET = "kafka.consumer.startingoffset";
	}
}

Y 推荐文献

  • mysql cdc 的序列化接口 : DebeziumDeserializationSchema
  • ...

X 参考文献

posted @ 2024-08-20 12:06  千千寰宇  阅读(485)  评论(0)    收藏  举报