CDC实战:MySQL实时同步数据到Elasticsearch之date、decimal、经纬度如何处理【CDC实战系列十一】
1、date日期处理:
需求:Mysql是datetime类型,时区utc+8,同步数据到es,映射到es的index的mapping是date,格式呢可以是时间戳,也可以是yyyy-MM-dd HH:mm:SS这样的格式化后的可读性较强的时间类型
这个我已经再前面的文章中说过了,点击查看最后面,就不在此细说了。
再说明一下:
(1)实际生产中mysql日期大都使用datetime,那我们就以datetime为例(mysql版本5.7)
建表:
-- goods.t_format_date definition CREATE TABLE `t_format_date` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id', `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', `create_user` varchar(20) DEFAULT NULL COMMENT '创建人', `update_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '修改时间', `update_user` varchar(20) DEFAULT NULL COMMENT '修改人', `date_time` date DEFAULT NULL COMMENT 'DATE类型日期', `datetime_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'DATETIME类型日期', `time_time` time DEFAULT NULL COMMENT 'TIME类型日期', `timestamp_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'TIMESTAMP类型日期', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COMMENT='日期格式化测试(用得最多的日期时间类型,就是 DATETIME。虽然 MySQL 也支持 YEAR(年)、 TIME(时间)、DATE(日期),以及 TIMESTAMP 类型,但是在实际项目中,尽量用 DATETIME 类型。因为这个数据类型包括了完整的日期和时间信息,取值范围也最大,使用起来比较方便。毕竟,如果日期时间信息分散在好几个字段,很不容易记,而且查询的时候,SQL 语句也会更加复杂。)';
插入数据:
INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(1, '2022-08-05 07:24:16', '00000', '2022-08-05 07:24:16', '666666', '2022-08-05', '2022-08-05 07:24:16', '12:12:12', '2022-08-05 07:24:16'); INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(2, '2022-08-05 09:42:32', '00000', '2022-08-05 09:42:32', NULL, NULL, '2022-08-05 09:42:32', NULL, '2022-08-05 09:42:32'); INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(3, '2022-08-05 09:46:17', '00000', '2022-08-05 09:46:17', NULL, NULL, '2022-08-05 09:46:17', NULL, '2022-08-05 09:46:17'); INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(4, '2022-08-09 03:11:13', '00000', '2022-08-09 03:11:13', NULL, NULL, '2022-08-09 03:11:13', NULL, '2022-08-09 03:11:13'); INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(5, '2022-08-09 12:26:24', '45', '2022-08-09 12:26:24', NULL, NULL, '2022-08-09 12:26:24', NULL, '2022-08-09 12:26:24'); INSERT INTO goods.t_format_date (id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time) VALUES(6, '2022-08-10 02:24:33', '11111', '2022-08-10 02:24:33', NULL, NULL, '2022-08-10 02:24:33', NULL, '2022-08-10 02:24:33');
(2)es中时间就是用date类型,表现形式就选择时间戳或者"yyyy-MM-dd HH:mm:ss",后者的优点就是可读性较强!
设置es中日期的mapping类型:
PUT /goods.goods.t_format_date { "mappings": { "date_detection": false, "dynamic_templates": [ { "dates": { "match": ".*_time", "match_pattern": "regex", "mapping": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis" } } } ] } }
需要兼容"yyyy-MM-dd HH:mm:ss"这种时间,重点就是需要指定"format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis"指定格式,包含yyyy-MM-dd HH:mm:ss,因为es默认只支持strict_date_optional_time || epoch_millis这两种。
(3)编辑connect脚本,文件名connect-mysql-date-source.json,内容如下:重点看
datetime转时间戳long型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",
-----配置MySqlDateTime2TimestampConverter这个自定义时间这换类,就是将date类型的时间转成时间戳,最终存储到es的时间就是时间戳,long类型的毫秒值。 注意:此时对应的时区配置"datetime.format.timestamp.zone": "+08:00"
datetime转格式化String类型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",
-----配置MySqlDateTimeConverter这个自定义时间这换类,就是将date类型的时间转成"yyyy-MM-dd HH:mm:ss"这种格式的时间, 最终存储到es的时间就是格式化后的时间类型,string类型的格式化后的时间,可读性强。
注意:此时对应的时区配置"datetime.format.timestamp.zone": "UTC+8"
{ "name": "goods-date-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "database.hostname": "127.0.0.1", "database.port": "3306", "database.user": "debezium_user", "database.password": "@debezium2022", "database.server.id": "12399", "snapshot.mode": "when_needed", "database.server.name": "goods001", "database.include.list": "goods", "table.include.list": "goods.t_format_date", "database.history.kafka.bootstrap.servers": "127.0.0.1:9092", "database.history.kafka.topic": "dbhistory.goods", "include.schema.changes": "true", "converters": "datetime", "datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter", "datetime.format": "yyyy-MM-dd", "datetime.format.time": "HH:mm:ss", "datetime.format.datetime": "yyyy-MM-dd HH:mm:ss", "datetime.format.timestamp": "yyyy-MM-dd HH:mm:ss", "datetime.format.timestamp.zone": "+08:00" } }
启动connect
./bin/connect-distributed.sh -daemon config/connect-distributed.properties
创建source connector:
curl -d @"connect-mysql-date-source.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors
查看已连接的connector:
curl http://localhost:8083/connectors
创建sink connector:
curl -d @"elasticsearch-sink.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors
查看kinbana:

2、decimal处理:
官方文档decimal.handling.mode 这个参数有三个配置:precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form.【二进制表示的精确值,使用不方便】double represents them using double values, which may result in a loss of precision but is easier to use.【double 类型,使用方便但是有精度损失问题,根据实际情况选择】string encodes values as formatted strings, which is easy to consume but semantic information about the real type is lost.【string类型,使用方便,但是可读性差。根据实际情况选择】
-------------
如果不加上面这个配置,index中mapping默认的是float,因为kafka中读到小数,index自动识别成float类型mapping
3、经纬度处理,同时将日期处理成格式化成yyyy-MM-dd HH:mm:ss的string日期:
建表:
-- goods.t_mountain definition CREATE TABLE `t_mountain` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id', `name` varchar(100) CHARACTER SET utf16le DEFAULT NULL COMMENT '名称', `location` varchar(20) CHARACTER SET utf16le NOT NULL DEFAULT '0,0' COMMENT '经纬度(Geopoint expressed as a string with the format: "lat,lon".)', `latitude` varchar(20) CHARACTER SET utf16le DEFAULT NULL COMMENT '纬度', `logtitude` varchar(20) CHARACTER SET utf16le DEFAULT NULL COMMENT '经度', `altitude` decimal(12,2) DEFAULT NULL COMMENT '海拔(m)', `create_user` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '创建人', `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', `update_user` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '更新人', `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=24 DEFAULT CHARSET=utf16 COMMENT='中国名山信息表';
插入数据:
INSERT INTO goods.t_mountain (id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time) VALUES(14, '东岳泰山', '36.269893,117.094738', '36.269893', '117.094738', 1532.70, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58'); INSERT INTO goods.t_mountain (id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time) VALUES(15, '西岳华山', '34.497647,110.073028', '34.497647', '110.073028', 2154.90, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58'); INSERT INTO goods.t_mountain (id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time) VALUES(16, '北岳恒山', '39.667618,113.731052', '39.667618', '113.731052', 2016.10, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58'); INSERT INTO goods.t_mountain (id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time) VALUES(17, '中岳嵩山', '34.519916,113.021371', '34.519916', '113.021371', 1491.70, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58'); INSERT INTO goods.t_mountain (id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time) VALUES(18, '南岳衡山', '27.27567,112.722068', '27.27567', '112.722068', 1300.20, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');
在es创建索引指定经纬度和日期的maping类型:
PUT /goods.goods.t_mountain { "mappings": { "date_detection": false, "dynamic_templates": [ { "dates": { "match": ".*_time", "match_pattern": "regex", "mapping": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis" } } }, { "points": { "match": ".*location", "match_pattern": "regex", "mapping": { "type": "geo_point" } } } ] } }
3)编辑connect脚本,文件名connect-mysql-source.json,内容如下:重点看
{ "name": "goods-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "database.hostname": "127.0.0.1", "database.port": "3306", "database.user": "debezium_user", "database.password": "@debezium2022", "database.server.id": "12358", "snapshot.mode": "when_needed", "database.server.name": "goods", "database.include.list": "goods", "table.include.list": "goods.t_mountain,goods.t_sku,goods.t_spu", "database.history.kafka.bootstrap.servers": "127.0.0.1:9092", "database.history.kafka.topic": "dbhistory.goods", "include.schema.changes": "true", "converters": "datetime", "datetime.type": "com.darcytech.debezium.converter.MySqlDateTimeConverter", "datetime.format": "yyyy-MM-dd", "datetime.format.time": "HH:mm:ss", "datetime.format.datetime": "yyyy-MM-dd HH:mm:ss", "datetime.format.timestamp": "yyyy-MM-dd HH:mm:ss", "datetime.format.timestamp.zone": "UTC+8" } }
datetime转格式化String类型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",
-----配置MySqlDateTimeConverter这个自定义时间这换类,就是将date类型的时间转成"yyyy-MM-dd HH:mm:ss"这种格式的时间, 最终存储到es的时间就是格式化后的时间类型,string类型的格式化后的时间,可读性强。
注意:此时对应的时区配置"datetime.format.timestamp.zone": "UTC+8"
启动connect
./bin/connect-distributed.sh -daemon config/connect-distributed.properties
创建source connector:
curl -d @"connect-mysql-date-source.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors
查看已连接的connector:
curl http://localhost:8083/connectors
创建sink connector:
curl -d @"elasticsearch-sink.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors
停止\重启connect、删除connector、查看connectors参考之前的文章
在CMAK查看kafka topics:
在kibana查看索引数据:
其他:
es索引中mapping是date,原始数据存储的是long(时间戳),则使用spring-data-elasticsearch查询是实体对象的format=epoch_millis;

原始数据存储的是string(yyyy-MM-dd HH:mm:SS),则使用spring-data-elasticsearch查询是实体对象的format=date_time;


浙公网安备 33010602011771号