CDC实战:MySQL实时同步数据到Elasticsearch之date、decimal、经纬度如何处理【CDC实战系列十一】

1、date日期处理:
需求:Mysql是datetime类型,时区utc+8,同步数据到es,映射到es的index的mapping是date,格式呢可以是时间戳,也可以是yyyy-MM-dd HH:mm:SS这样的格式化后的可读性较强的时间类型
这个我已经再前面的文章中说过了,点击查看最后面,就不在此细说了。
再说明一下:
(1)实际生产中mysql日期大都使用datetime,那我们就以datetime为例(mysql版本5.7)

建表:

-- goods.t_format_date definition

CREATE TABLE `t_format_date` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `create_user` varchar(20) DEFAULT NULL COMMENT '创建人',
  `update_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '修改时间',
  `update_user` varchar(20) DEFAULT NULL COMMENT '修改人',
  `date_time` date DEFAULT NULL COMMENT 'DATE类型日期',
  `datetime_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'DATETIME类型日期',
  `time_time` time DEFAULT NULL COMMENT 'TIME类型日期',
  `timestamp_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'TIMESTAMP类型日期',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COMMENT='日期格式化测试(用得最多的日期时间类型,就是 DATETIME。虽然 MySQL 也支持 YEAR(年)、 TIME(时间)、DATE(日期),以及 TIMESTAMP 类型,但是在实际项目中,尽量用 DATETIME 类型。因为这个数据类型包括了完整的日期和时间信息,取值范围也最大,使用起来比较方便。毕竟,如果日期时间信息分散在好几个字段,很不容易记,而且查询的时候,SQL 语句也会更加复杂。)';

插入数据:

INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(1, '2022-08-05 07:24:16', '00000', '2022-08-05 07:24:16', '666666', '2022-08-05', '2022-08-05 07:24:16', '12:12:12', '2022-08-05 07:24:16');
INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(2, '2022-08-05 09:42:32', '00000', '2022-08-05 09:42:32', NULL, NULL, '2022-08-05 09:42:32', NULL, '2022-08-05 09:42:32');
INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(3, '2022-08-05 09:46:17', '00000', '2022-08-05 09:46:17', NULL, NULL, '2022-08-05 09:46:17', NULL, '2022-08-05 09:46:17');
INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(4, '2022-08-09 03:11:13', '00000', '2022-08-09 03:11:13', NULL, NULL, '2022-08-09 03:11:13', NULL, '2022-08-09 03:11:13');
INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(5, '2022-08-09 12:26:24', '45', '2022-08-09 12:26:24', NULL, NULL, '2022-08-09 12:26:24', NULL, '2022-08-09 12:26:24');
INSERT INTO goods.t_format_date
(id, create_time, create_user, update_time, update_user, date_time, datetime_time, time_time, timestamp_time)
VALUES(6, '2022-08-10 02:24:33', '11111', '2022-08-10 02:24:33', NULL, NULL, '2022-08-10 02:24:33', NULL, '2022-08-10 02:24:33');

(2)es中时间就是用date类型,表现形式就选择时间戳或者"yyyy-MM-dd HH:mm:ss",后者的优点就是可读性较强!
     设置es中日期的mapping类型:

PUT /goods.goods.t_format_date
{
  "mappings": {
    "date_detection": false,
    "dynamic_templates": [
      {
        "dates": {
          "match": ".*_time",
          "match_pattern": "regex",
          "mapping": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis"
          }
        }
      }
    ]
  }
}

需要兼容"yyyy-MM-dd HH:mm:ss"这种时间,重点就是需要指定"format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis"指定格式,包含yyyy-MM-dd HH:mm:ss,因为es默认只支持strict_date_optional_time || epoch_millis这两种。

(3)编辑connect脚本,文件名connect-mysql-date-source.json,内容如下:重点看

datetime转时间戳long型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",   
-----配置MySqlDateTime2TimestampConverter这个自定义时间这换类,就是将date类型的时间转成时间戳,最终存储到es的时间就是时间戳,long类型的毫秒值。 注意:此时对应的时区配置"datetime.format.timestamp.zone": "+08:00"
datetime转格式化String类型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",    
-----配置MySqlDateTimeConverter这个自定义时间这换类,就是将date类型的时间转成"yyyy-MM-dd HH:mm:ss"这种格式的时间, 最终存储到es的时间就是格式化后的时间类型,string类型的格式化后的时间,可读性强。
注意:此时对应的时区配置"datetime.format.timestamp.zone": "UTC+8"

详细源码参考

{
    "name": "goods-date-connector", 
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector", 
        "database.hostname": "127.0.0.1", 
        "database.port": "3306", 
        "database.user": "debezium_user", 
        "database.password": "@debezium2022", 
        "database.server.id": "12399", 
        "snapshot.mode": "when_needed",
        "database.server.name": "goods001", 
        "database.include.list": "goods", 
        "table.include.list": "goods.t_format_date",
        "database.history.kafka.bootstrap.servers": "127.0.0.1:9092", 
        "database.history.kafka.topic": "dbhistory.goods", 
        "include.schema.changes": "true",
        "converters": "datetime",
       "datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",
        "datetime.format": "yyyy-MM-dd",
        "datetime.format.time": "HH:mm:ss",
        "datetime.format.datetime": "yyyy-MM-dd HH:mm:ss",
        "datetime.format.timestamp": "yyyy-MM-dd HH:mm:ss",
        "datetime.format.timestamp.zone": "+08:00"
    }
}

启动connect

./bin/connect-distributed.sh -daemon config/connect-distributed.properties

创建source  connector:

curl -d @"connect-mysql-date-source.json"   -H"Content-Type: application/json" -X POST http://localhost:8083/connectors

查看已连接的connector:

curl http://localhost:8083/connectors

创建sink connector:

curl -d @"elasticsearch-sink.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors

查看kinbana:

2、decimal处理:

官方文档
decimal.handling.mode  这个参数有三个配置:
precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form.【二进制表示的精确值,使用不方便】

double represents them using double values, which may result in a loss of precision but is easier to use.【double 类型,使用方便但是有精度损失问题,根据实际情况选择】

string encodes values as formatted strings, which is easy to consume but semantic information about the real type is lost.【string类型,使用方便,但是可读性差。根据实际情况选择】

-------------

如果不加上面这个配置,index中mapping默认的是float,因为kafka中读到小数,index自动识别成float类型mapping


3、经纬度处理,同时将日期处理成格式化成yyyy-MM-dd HH:mm:ss的string日期:

 建表:

-- goods.t_mountain definition

CREATE TABLE `t_mountain` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `name` varchar(100) CHARACTER SET utf16le DEFAULT NULL COMMENT '名称',
  `location` varchar(20) CHARACTER SET utf16le NOT NULL DEFAULT '0,0' COMMENT '经纬度(Geopoint expressed as a string with the format: "lat,lon".)',
  `latitude` varchar(20) CHARACTER SET utf16le DEFAULT NULL COMMENT '纬度',
  `logtitude` varchar(20) CHARACTER SET utf16le DEFAULT NULL COMMENT '经度',
  `altitude` decimal(12,2) DEFAULT NULL COMMENT '海拔(m)',
  `create_user` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '创建人',
  `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  `update_user` varchar(20) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '更新人',
  `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24 DEFAULT CHARSET=utf16 COMMENT='中国名山信息表';

插入数据:

INSERT INTO goods.t_mountain
(id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time)
VALUES(14, '东岳泰山', '36.269893,117.094738', '36.269893', '117.094738', 1532.70, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');
INSERT INTO goods.t_mountain
(id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time)
VALUES(15, '西岳华山', '34.497647,110.073028', '34.497647', '110.073028', 2154.90, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');
INSERT INTO goods.t_mountain
(id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time)
VALUES(16, '北岳恒山', '39.667618,113.731052', '39.667618', '113.731052', 2016.10, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');
INSERT INTO goods.t_mountain
(id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time)
VALUES(17, '中岳嵩山', '34.519916,113.021371', '34.519916', '113.021371', 1491.70, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');
INSERT INTO goods.t_mountain
(id, name, location, latitude, logtitude, altitude, create_user, create_time, update_user, update_time)
VALUES(18, '南岳衡山', '27.27567,112.722068', '27.27567', '112.722068', 1300.20, '111111', '2022-08-04 08:39:53', NULL, '2022-08-09 07:08:58');

在es创建索引指定经纬度和日期的maping类型:

PUT /goods.goods.t_mountain
{
  "mappings": {
    "date_detection": false,
    "dynamic_templates": [
      {
        "dates": {
          "match": ".*_time",
          "match_pattern": "regex",
          "mapping": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss || strict_date_optional_time || epoch_millis"
          }
        }
      },
      {
        "points": {
          "match": ".*location",
          "match_pattern": "regex",
          "mapping": {
            "type": "geo_point"
          }
        }
      }
    ]
  }
}

3)编辑connect脚本,文件名connect-mysql-source.json,内容如下:重点看

详细源码参考

{
    "name": "goods-connector", 
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector", 
        "database.hostname": "127.0.0.1", 
        "database.port": "3306", 
        "database.user": "debezium_user", 
        "database.password": "@debezium2022", 
        "database.server.id": "12358", 
        "snapshot.mode": "when_needed",
        "database.server.name": "goods", 
        "database.include.list": "goods", 
        "table.include.list": "goods.t_mountain,goods.t_sku,goods.t_spu",
        "database.history.kafka.bootstrap.servers": "127.0.0.1:9092", 
        "database.history.kafka.topic": "dbhistory.goods", 
        "include.schema.changes": "true",
        "converters": "datetime",
        "datetime.type": "com.darcytech.debezium.converter.MySqlDateTimeConverter",
        "datetime.format": "yyyy-MM-dd",
        "datetime.format.time": "HH:mm:ss",
        "datetime.format.datetime": "yyyy-MM-dd HH:mm:ss",
        "datetime.format.timestamp": "yyyy-MM-dd HH:mm:ss",
        "datetime.format.timestamp.zone": "UTC+8"
    }
}
datetime转格式化String类型日期:
"datetime.type": "com.darcytech.debezium.converter.MySqlDateTime2TimestampConverter",    
-----配置MySqlDateTimeConverter这个自定义时间这换类,就是将date类型的时间转成"yyyy-MM-dd HH:mm:ss"这种格式的时间, 最终存储到es的时间就是格式化后的时间类型,string类型的格式化后的时间,可读性强。
注意:此时对应的时区配置"datetime.format.timestamp.zone": "UTC+8"

启动connect

./bin/connect-distributed.sh -daemon config/connect-distributed.properties

创建source  connector:

curl -d @"connect-mysql-date-source.json"   -H"Content-Type: application/json" -X POST http://localhost:8083/connectors

查看已连接的connector:

curl http://localhost:8083/connectors

创建sink connector:

curl -d @"elasticsearch-sink.json" -H"Content-Type: application/json" -X POST http://localhost:8083/connectors

停止\重启connect、删除connector、查看connectors参考之前的文章

 

CMAK查看kafka topics:

 

 在kibana查看索引数据:

 其他:

es索引中mapping是date,原始数据存储的是long(时间戳),则使用spring-data-elasticsearch查询是实体对象的format=epoch_millis;

原始数据存储的是string(yyyy-MM-dd HH:mm:SS),则使用spring-data-elasticsearch查询是实体对象的format=date_time;

 

posted @ 2025-11-04 17:23  下午喝什么茶  阅读(13)  评论(0)    收藏  举报