SeaTunnel(2.3.12)的高级用法(三):目标库(PostgreSQL)的DEMO

SeaTunnel(2.3.12)的高级用法(三):目标库(PostgreSQL)的DEMO

添加源头测试数据(mysql)

添加源头测试数据见:https://www.cnblogs.com/kakarotto-chen/p/19336931#_label1_2

demo1:全量-清空后插入-物理删除

  • 适用于无主键表的采集,仅当集成类型为全量、且同步删除模式为物理删除的情况下生效(每次目标表将先清空后插入源头所有数据)

  • 核心原理:每次都清空重灌

  • 核心配置:data_save_mode = "DROP_DATA"

  • 执行命令

sh /……/seatunnel-2.3.12/bin/seatunnel.sh --config /……/myconf/ds-st-demo1-mysql2pgsql-ql-qkhcr-wlsc.conf -i JAVA_OPTS='-Xmx2g -Xms2g' -m local
  • 建表
-- ds-st-demo1-mysql2pgsql-ql-qkhcr-wlsc.conf
CREATE TABLE "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc" (
  id BIGINT PRIMARY KEY,
  user_name VARCHAR(2000),
  sex VARCHAR(20),
  decimal_f NUMERIC(32, 6),
  phone_number VARCHAR(20),
  age INT,
  create_time TIMESTAMP,
  description TEXT,
  address VARCHAR(2000) DEFAULT '未知'
);
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."id" IS '主键';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."user_name" IS '名字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."sex" IS '性别:男;女';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."decimal_f" IS '大数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."phone_number" IS '电话';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."age" IS '字符串年龄转数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."create_time" IS '新增时间';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."description" IS '大文本';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo1_ql_qkhcr_wlsc"."address" IS '空地址转默认值:未知';
  • conf配置
# ds-st-demo4-mysql2pgsql-ql-qkhcr-wlsc.conf
env {
  # 并行度(线程数)
  execution.parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 采集的数据sql
    query = "select * from t_8_100w"
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 数值型主键字段
    partition_column = "id"
    # 分片数,匹配并行度
    partition_num = 5
    # 批量提交数
    fetch_size = 5000
    # partition_lower_bound = 1     # 可选:起始ID
    # partition_upper_bound = 1000000 # 可选:结束ID
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "5000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      id = id
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
    }
  }
  
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT 
        id,
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    # query = "insert into test_table(name,age) values(?,?)"
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    generate_sink_sql = true
    # generate_sink_sql=true。所以:database必须要
    database = source_db
    table = "public.t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc"
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    # 表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌
    data_save_mode = "DROP_DATA"
    
    # 批量写入条数
    batch_size = 5000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}
  • 结果
    • 日志中可以看到执行了清空表语句:TRUNCATE TABLE "public"."t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc"
2025-12-15 15:32:25,675 INFO  [.s.c.s.j.c.AbstractJdbcCatalog] [seatunnel-coordinator-service-2] - Catalog Postgres established connection to jdbc:postgresql://ip:port/source_db
2025-12-15 15:32:25,676 INFO  [a.s.a.s.SaveModeExecuteWrapper] [seatunnel-coordinator-service-2] - Executing save mode for table: source_db.public.t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc, with SchemaSaveMode: ERROR_WHEN_SCHEMA_NOT_EXIST, DataSaveMode: DROP_DATA using Catalog: Postgres
2025-12-15 15:32:25,719 INFO  [a.s.a.s.DefaultSaveModeHandler] [seatunnel-coordinator-service-2] - Truncating table source_db.public.t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc with action TRUNCATE TABLE  "public"."t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc"
2025-12-15 15:32:25,722 INFO  [.s.c.s.j.c.AbstractJdbcCatalog] [seatunnel-coordinator-service-2] - Execute sql : TRUNCATE TABLE  "public"."t_8_100w_imp_st_ds_demo4_ql_qkhcr_wlsc"

2025-12-15 15:33:26,616 INFO  [o.a.s.e.c.j.JobMetricsRunner  ] [job-metrics-runner-1052850597616680961] - 
***********************************************
           Job Progress Information
***********************************************
Job Id                    : 1052850597616680961
Read Count So Far         :              849934
Write Count So Far        :              849934
Average Read Count        :             14165/s
Average Write Count       :             14165/s
Last Statistic Time       : 2025-12-15 15:32:26
Current Statistic Time    : 2025-12-15 15:33:26
***********************************************

2025-12-15 15:33:39,905 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
***********************************************
           Job Statistic Information
***********************************************
Start Time                : 2025-12-15 15:32:23
End Time                  : 2025-12-15 15:33:39
Total Time(s)             :                  76
Total Read Count          :             1000004
Total Write Count         :             1000004
Total Failed Count        :                   0
***********************************************

demo2:全量-差异比较采集(新增+更新)-物理删除 + 大表分片

  • 核心需求:根据主键判断,对源头新增和变化数据做插入和更新操作,没变化的数据不做处理

  • 核心逻辑:使用pgsql的:INSERT INTO …… ON CONFLICT ("主键") DO UPDATE SET …… 的sql

  • 核心配置:

    • query中使用pgsql的:INSERT INTO …… ON CONFLICT ("主键") DO UPDATE SET …… 的sql
    • 大表分片:partition_column + split.size
  • 执行命令

sh /……/seatunnel-2.3.12/bin/seatunnel.sh --config /……/myconf/ds-st-demo2-mysql2pgsql-ql-cybjcj-wlsc.conf -i JAVA_OPTS='-Xmx2g -Xms2g' -m local
  • 建表
-- ds-st-demo2-mysql2pgsql-ql-cybjcj-wlsc.conf
CREATE TABLE "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc" (
  id BIGINT PRIMARY KEY,
  user_name VARCHAR(2000),
  sex VARCHAR(20),
  decimal_f NUMERIC(32, 6),
  phone_number VARCHAR(20),
  age INT,
  create_time TIMESTAMP,
  description TEXT,
  address VARCHAR(2000) DEFAULT '未知'
);
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."id" IS '主键';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."user_name" IS '名字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."sex" IS '性别:男;女';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."decimal_f" IS '大数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."phone_number" IS '电话';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."age" IS '字符串年龄转数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."create_time" IS '新增时间';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."description" IS '大文本';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."address" IS '空地址转默认值:未知';
  • conf配置
env {
  # 任务名字:业务中可以弄表id
  job.name = "ds-st-demo2-mysql2pgsql-ql-cybjcj-wlsc.conf"
  # 最大批线程数:并行度(线程数)
  parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 采集的数据sql
    query = "select * from t_8_100w"
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 分片的字段:支持:String、Number(int, bigint, decimal, ...)、Date
    partition_column = "id"
    # 表的分割大小(行数):每个分片的数据行(默认8096行)。最后分片数=表的总行数 / split.size
    split.size = 50000
    # 分片数,匹配并行度parallelism(2.3.12已不推荐配置了,用split.size来代替)
    # partition_num = 5
    # 最大批处理数:查询的行提取大小(指定当前任务每次执行时读取数据条数,该值(默认1000)受运行内存影响,若该值较大或单条数据量较大,需适当调整运行内存大小。)
    fetch_size = 10000
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      # 时区,不同数据库参数不一样
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "10000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      id = id
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
    }
  }
  # 转换age为数字类型(pgsql必须转)
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT 
        id,
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    query = """INSERT INTO public.t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc
    (id, user_name, sex, decimal_f, phone_number, age, create_time, description, address)
    VALUES(?,?,?,?,?,?,?,?,?) 
    ON CONFLICT ("id")  
    DO UPDATE SET  
    "user_name" = EXCLUDED."user_name",
    "sex" = EXCLUDED."sex",
    "decimal_f" = EXCLUDED."decimal_f",
    "phone_number" = EXCLUDED."phone_number",
    "age" = EXCLUDED."age",
    "create_time" = EXCLUDED."create_time",
    "description" = EXCLUDED."description",
    "address" = EXCLUDED."address""""
    
    # 自动生成sql的配置,和query参数互斥
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    # generate_sink_sql = true
    # database必须要,因为generate_sink_sql=true
    # database = source_db
    # 自动生成sql时,table必须要
    # table = "public.t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"
    
    # 表结构处理策略:表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # 插入数据的处理策略
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌
    # CUSTOM_PROCESSING :用户定义处理。需要配合:custom_sql使用
    data_save_mode = "APPEND_DATA"
    # 当 data_save_mode 选择 CUSTOM_PROCESSING 时,您应该填写 CUSTOM_SQL 参数。此参数通常填入可执行的 SQL。SQL 将在同步任务之前执行。
    #可以实现:同步删除(执行前置update、truncate的sql等) 
    # custom_sql = ""
    
    # 批量写入条数
    batch_size = 10000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}
  • 结果
2025-12-16 17:07:03,711 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
        ***********************************************
                   Job Statistic Information
        ***********************************************
        Start Time                : 2025-12-16 17:06:08
        End Time                  : 2025-12-16 17:07:03
        Total Time(s)             :                  54
        Total Read Count          :             1000001
        Total Write Count         :             1000001
        Total Failed Count        :                   0
        ***********************************************

demo2.1:全量-差异比较采集(新增+更新)-物理删除 + 大表分片(自动生成sql)

  • 核心需求:和demo2一样

  • 核心逻辑:和demo2一样。但是seatunnel会自动生成sql,不用我们写query的语句

  • 核心配置:enable_upsert = true + primary_keys = ["可以判断唯一值的健"]

  • 执行命令

sh /……/seatunnel-2.3.12/bin/seatunnel.sh --config /……/myconf/ds-st-demo2-mysql2pgsql-ql-cybjcj-wlsc-2-enable_upsert.conf -i JAVA_OPTS='-Xmx2g -Xms2g' -m local
  • conf配置
env {
  # 任务名字:业务中可以弄表id
  job.name = "ds-st-demo2-mysql2pgsql-ql-cybjcj-wlsc-2-enable_upsert.conf"
  # 最大批线程数:并行度(线程数)
  parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 采集的数据sql
    query = "select * from t_8_100w"
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 分片的字段:支持:String、Number(int, bigint, decimal, ...)、Date
    partition_column = "id"
    # 表的分割大小(行数):每个分片的数据行(默认8096行)。最后分片数=表的总行数 / split.size
    split.size = 50000
    # 分片数,匹配并行度parallelism(2.3.12已不推荐配置了,用split.size来代替)
    # partition_num = 5
    # 最大批处理数:查询的行提取大小(指定当前任务每次执行时读取数据条数,该值(默认1000)受运行内存影响,若该值较大或单条数据量较大,需适当调整运行内存大小。)
    fetch_size = 10000
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      # 时区,不同数据库参数不一样
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "10000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      id = id
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
    }
  }
  # 转换age为数字类型(pgsql必须转)
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT 
        id,
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    # query = """INSERT INTO public.t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc
    # (id, user_name, sex, decimal_f, phone_number, age, create_time, description, address)
    # VALUES(?,?,?,?,?,?,?,?,?) 
    # ON CONFLICT ("id")  
    # DO UPDATE SET  
    # "user_name" = EXCLUDED."user_name",
    # "sex" = EXCLUDED."sex",
    # "decimal_f" = EXCLUDED."decimal_f",
    # "phone_number" = EXCLUDED."phone_number",
    # "age" = EXCLUDED."age",
    # "create_time" = EXCLUDED."create_time",
    # "description" = EXCLUDED."description",
    # "address" = EXCLUDED."address""""
    
    # 自动生成sql的配置,和query参数互斥
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    generate_sink_sql = true
    # database必须要,因为generate_sink_sql=true
    database = source_db
    # 自动生成sql时,table必须要
    table = "public.t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"
    # 生成类似:ON CONFLICT ("id") DO UPDATE SET 的sql
    enable_upsert = true
    # 判断值唯一的健:此选项用于支持在自动生成 SQL 时进行 insert,delete 和 update 操作。
    primary_keys = ["id"]
    
    # 表结构处理策略:表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # 插入数据的处理策略
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌(先执行:truncate语句)
    # CUSTOM_PROCESSING :用户定义处理。需要配合:custom_sql使用
    data_save_mode = "APPEND_DATA"
    # 当 data_save_mode 选择 CUSTOM_PROCESSING 时,您应该填写 CUSTOM_SQL 参数。此参数通常填入可执行的 SQL。SQL 将在同步任务之前执行。
    #可以实现:同步删除(执行前置update、truncate的sql等) 
    # custom_sql = ""
    
    # 批量写入条数
    batch_size = 10000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}
  • 结果
    • 可以看到自动生成了:
2025-12-17 13:53:35,566 INFO  [.e.FieldNamedPreparedStatement] [st-multi-table-sink-writer-1] - PrepareStatement sql is:
INSERT INTO "source_db"."public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc" ("id", "user_name", "sex", "decimal_f", "phone_number", "age", "create_time", "description", "address") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT ("id") DO UPDATE SET "user_name"=EXCLUDED."user_name", "sex"=EXCLUDED."sex", "decimal_f"=EXCLUDED."decimal_f", "phone_number"=EXCLUDED."phone_number", "age"=EXCLUDED."age", "create_time"=EXCLUDED."create_time", "description"=EXCLUDED."description", "address"=EXCLUDED."address"

2025-12-17 13:53:35,566 INFO  [.e.FieldNamedPreparedStatement] [st-multi-table-sink-writer-1] - PrepareStatement sql is:
DELETE FROM "source_db"."public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc" WHERE "id" = ?

2025-12-17 13:54:46,130 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
***********************************************
           Job Statistic Information
***********************************************
Start Time                : 2025-12-17 13:53:31
End Time                  : 2025-12-17 13:54:46
Total Time(s)             :                  74
Total Read Count          :             1000001
Total Write Count         :             1000001
Total Failed Count        :                   0
***********************************************

demo3:仅更新

  • 核心需求:只对主键存在的数据做更新,主键不存在的数据忽略采集。其实这种方式,不能算作全量更新了

  • 核心逻辑:我们在sink目标中执行query的sql语句是:update语句,不是insert语句

  • 核心配置:query的update的sql语句

    • 细节:需要注意数据集:FieldMapper_data中的id的位置在最后。因为update的id参数在最后
  • 执行命令

# ds-st-demo3-mysql2pgsql-ql-jgx-gb.conf
sh /……/seatunnel-2.3.12/bin/seatunnel.sh --config /……/myconf/ds-st-demo3-mysql2pgsql-ql-jgx-gb.conf -i JAVA_OPTS='-Xmx2g -Xms2g' -m local
  • 建表
-- ds-st-demo3-mysql2pgsql-ql-jgx-gb.conf
CREATE TABLE "public"."t_8_100w_imp_st_ds_demo3_ql_jgx_gb" (
  id BIGINT PRIMARY KEY,
  user_name VARCHAR(2000),
  sex VARCHAR(20),
  decimal_f NUMERIC(32, 6),
  phone_number VARCHAR(20),
  age INT,
  create_time TIMESTAMP,
  description TEXT,
  address VARCHAR(2000) DEFAULT '未知'
);
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."id" IS '主键';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."user_name" IS '名字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."sex" IS '性别:男;女';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."decimal_f" IS '大数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."phone_number" IS '电话';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."age" IS '字符串年龄转数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."create_time" IS '新增时间';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."description" IS '大文本';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"."address" IS '空地址转默认值:未知';
  • conf配置
env {
  # 任务名字:业务中可以弄表id
  job.name = "ds-st-demo3-mysql2pgsql-ql-jgx-gb.conf"
  # 最大批线程数:并行度(线程数)
  parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 采集的数据sql
    query = "select name,sex,decimal_f,phone_number,age,create_time,description,address,id from t_8_100w"
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 分片的字段:支持:String、Number(int, bigint, decimal, ...)、Date
    partition_column = "id"
    # 表的分割大小(行数):每个分片的数据行(默认8096行)。最后分片数=表的总行数 / split.size
    split.size = 50000
    # 分片数,匹配并行度parallelism(2.3.12已不推荐配置了,用split.size来代替)
    # partition_num = 5
    # 最大批处理数:查询的行提取大小(指定当前任务每次执行时读取数据条数,该值(默认1000)受运行内存影响,若该值较大或单条数据量较大,需适当调整运行内存大小。)
    fetch_size = 10000
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      # 时区,不同数据库参数不一样
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "10000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
      id = id
    }
  }
  # 转换age为数字类型(pgsql必须转)
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address,
    id
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"

    query = """UPDATE "public"."t_8_100w_imp_st_ds_demo3_ql_jgx_gb" 
    SET "user_name"=?, "sex"=?, "decimal_f"=?, "phone_number"=?, "age"=?, "create_time"=?, "description"=?, "address"=?
    WHERE "id"=?"""
    
    # 自动生成sql的配置,和query参数互斥
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    # generate_sink_sql = true
    # database必须要,因为generate_sink_sql=true
    # database = source_db
    # 自动生成sql时,table必须要
    # table = "public.t_8_100w_imp_st_ds_demo2_ql_cybjcj_wlsc"
    # 生成类似:ON CONFLICT ("id") DO UPDATE SET 的sql
    # enable_upsert = true
    # 判断值唯一的健:此选项用于支持在自动生成 SQL 时进行 insert,delete 和 update 操作。
    # primary_keys = ["id"]
    
    # 表结构处理策略:表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # 插入数据的处理策略
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌
    # CUSTOM_PROCESSING :用户定义处理。需要配合:custom_sql使用
    data_save_mode = "CUSTOM_PROCESSING"
    # 当 data_save_mode 选择 CUSTOM_PROCESSING 时,您应该填写 CUSTOM_SQL 参数。此参数通常填入可执行的 SQL。SQL 将在同步任务之前执行。
    #可以实现:同步删除(执行前置update、truncate的sql等)
    #todo cc 这个sql未执行,不知道为啥。
    custom_sql = """update "source_db"."public"."t_8_100w_imp_st_ds_demo3_ql_jgx_gb" set "my_status" = 12 where 1=1;"""
    
    # 批量写入条数
    batch_size = 10000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}
  • 结果
2025-12-17 15:57:18,995 INFO  [o.a.s.a.e.LoggingEventHandler ] [seatunnel-coordinator-service-8] - log event: JobStateEvent(jobId=1053581219121332225, jobName=ds-st-demo3-mysql2pgsql-ql-jgx-gb.conf, jobStatus=FINISHED, createdTime=1765958238995)
    2025-12-17 15:57:19,013 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
    ***********************************************
               Job Statistic Information
    ***********************************************
    Start Time                : 2025-12-17 15:55:37
    End Time                  : 2025-12-17 15:57:18
    Total Time(s)             :                 101
    Total Read Count          :             1000002
    Total Write Count         :             1000002
    Total Failed Count        :                   0
    ***********************************************

demo4:增量-仅新增:时间戳+自增字段+区间全量 + 前置sql执行

  • 核心需求:对源端读取过来的新增数据做新增操作

  • 核心逻辑:在源头source的sql中做增量的数据过滤,只采集增量的数据

  • 核心配置:source的query的sql

  • 前置sql执行条件:sink中自动生成sql开启,不用query的sql

    • custom_sql
  • 执行命令

# ds-st-demo5-mysql2pgsql-zl-jxz-sjcjzzzd.conf
sh /……/seatunnel-2.3.12/bin/seatunnel.sh --config /……/myconf/ds-st-demo5-mysql2pgsql-zl-jxz-sjcjzzzd.conf -i JAVA_OPTS='-Xmx2g -Xms2g' -m local
  • 建表
-- ds-st-demo5-mysql2pgsql-zl-jxz-sjcjzzzd.conf
CREATE TABLE "public"."`t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd`" (
  id BIGINT PRIMARY KEY,
  user_name VARCHAR(2000),
  sex VARCHAR(20),
  decimal_f NUMERIC(32, 6),
  phone_number VARCHAR(20),
  age INT,
  create_time TIMESTAMP,
  description TEXT,
  address VARCHAR(2000) DEFAULT '未知',
  my_status INT
);
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."id" IS '主键';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."user_name" IS '名字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."sex" IS '性别:男;女';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."decimal_f" IS '大数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."phone_number" IS '电话';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."age" IS '字符串年龄转数字';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."create_time" IS '新增时间';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."description" IS '大文本';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."address" IS '空地址转默认值:未知';
COMMENT ON COLUMN "public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"."my_status" IS '状态';
  • conf配置
env {
  # 任务名字:业务中可以弄表id
  job.name = "ds-st-demo5-mysql2pgsql-zl-jxz-sjcjzzzd.conf"
  # 最大批线程数:并行度(线程数)
  parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 增量-仅新增
    # 时间戳+自增字段:id > 12 and create_time > '2025-12-02 13:28:51':筛选的数据:13、14
    # 区间全量:decimal_f between 7 and 9:筛选的数据:11
    query = """select * from t_8_10 where (id > 12 and create_time > '2025-12-02 13:28:51') or (decimal_f between 7 and 9)"""
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 分片的字段:支持:String、Number(int, bigint, decimal, ...)、Date
    partition_column = "id"
    # 表的分割大小(行数):每个分片的数据行(默认8096行)。最后分片数=表的总行数 / split.size
    split.size = 50000
    # 分片数,匹配并行度parallelism(2.3.12已不推荐配置了,用split.size来代替)
    # partition_num = 5
    # 最大批处理数:查询的行提取大小(指定当前任务每次执行时读取数据条数,该值(默认1000)受运行内存影响,若该值较大或单条数据量较大,需适当调整运行内存大小。)
    fetch_size = 10000
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      # 时区,不同数据库参数不一样
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "10000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      id = id
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
    }
  }
  # 转换age为数字类型(pgsql必须转)
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT
        id,
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    
    #query = """INSERT INTO public.t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd
    #(id, user_name, sex, decimal_f, phone_number, age, create_time, description, address)
    #VALUES(?,?,?,?,?,?,?,?,?)"""
    #query = """UPDATE "public"."t_8_100w_imp_st_ds_demo3_ql_jgx_gb" 
    #SET "user_name"=?, "sex"=?, "decimal_f"=?, "phone_number"=?, "age"=?, "create_time"=?, "description"=?, "address"=?
    #WHERE "id"=?"""
    
    # 自动生成sql的配置,和query参数互斥
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    generate_sink_sql = true
    # database必须要,因为generate_sink_sql=true。
    database = source_db
    # 自动生成sql时,table必须要。
    table = "public.t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd"
    # 生成类似:ON CONFLICT ("id") DO UPDATE SET 的sql
    # enable_upsert = true
    # 判断值唯一的健:此选项用于支持在自动生成 SQL 时进行 insert,delete 和 update 操作。
    # primary_keys = ["id"]

    # 表结构处理策略:表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # 插入数据的处理策略
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌
    # CUSTOM_PROCESSING :用户定义处理。需要配合:custom_sql使用
    data_save_mode = "CUSTOM_PROCESSING"
    # 当 data_save_mode 选择 CUSTOM_PROCESSING 时,您应该填写 CUSTOM_SQL 参数。此参数通常填入可执行的 SQL。SQL 将在同步任务之前执行。
    #可以实现:同步删除(执行前置update、truncate的sql等)
    #这个sql未执行,不知道为啥。
    #这个sql已经执行。原因:因为generate_sink_sql=true的原因。才会执行custom_sql。(只有自动生成sql的时候,这个才会执行)
    custom_sql = """update "source_db"."public"."t_8_100w_imp_st_ds_demo5_zl_jxz_sjcjzzzd" set "my_status" = 11"""
    
    # 批量写入条数
    batch_size = 10000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}
  • 结果
2025-12-18 10:22:36,044 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
        ***********************************************
                   Job Statistic Information
        ***********************************************
        Start Time                : 2025-12-18 10:22:31
        End Time                  : 2025-12-18 10:22:35
        Total Time(s)             :                   4
        Total Read Count          :                   3
        Total Write Count         :                   3
        Total Failed Count        :                   0
        ***********************************************

demo6:仅更新,最新配置(无细节)

env {
  # 任务名字:业务中可以弄表id
  job.name = "ds-st-demo6-mysql2pgsql-zl-jgx-sjcjzzzd.conf"
  # 最大批线程数:并行度(线程数)
  parallelism = 5
  # 任务模式:BATCH:批处理模式;STREAMING:流处理模式
  job.mode = "BATCH"
}

source {
  jdbc {
    url = "jdbc:mysql://ip:port/cs1"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "zysoft"
    # 增量-仅新增
    # 时间戳+自增字段:id > 12 and create_time > '2025-12-02 13:28:51':筛选的数据:13、14
    # 区间全量:decimal_f between 7 and 9:筛选的数据:11
    # query = """select * from t_8_10 where (id > 12 and create_time > '2025-12-02 13:28:51') or (decimal_f between 7 and 9)"""
    # 增量-仅更新
    # 只更新主键存在的数据,也是全量查询
    query = "select * from t_8_10"
    # 给这个数据集起个名字
    plugin_output = "source_data"
    
    # 并行读取配置
    # 分片的字段:支持:String、Number(int, bigint, decimal, ...)、Date
    partition_column = "id"
    # 表的分割大小(行数):每个分片的数据行(默认8096行)。最后分片数=表的总行数 / split.size
    split.size = 50000
    # 分片数,匹配并行度parallelism(2.3.12已不推荐配置了,用split.size来代替)
    # partition_num = 5
    # 最大批处理数:查询的行提取大小(指定当前任务每次执行时读取数据条数,该值(默认1000)受运行内存影响,若该值较大或单条数据量较大,需适当调整运行内存大小。)
    fetch_size = 10000
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      useUnicode = true
      characterEncoding = "utf8"
      # 时区,不同数据库参数不一样
      serverTimezone = "Asia/Shanghai"
      # 使用游标提高大结果集性能
      useCursorFetch = "true"
      # 每次获取行数
      defaultFetchSize = "10000"
    }
  }
}

# 清洗转换(简单的清洗转换,直接在source的query的sql中处理了就行)
transform {
  # 1. 字段映射:sql中做了,实际生成中不在这里处理。直接在source的query的sql中处理了就行
  # 还可以用:FieldMapper 插件,来映射字段
  FieldMapper {
    plugin_input = "source_data"
    plugin_output = "FieldMapper_data"
    field_mapper = {
      name = user_name
      sex = sex
      decimal_f = decimal_f
      phone_number = phone_number
      # 临时字段名
      age = age_str
      create_time = create_time
      description = description
      address = address
      id = id
    }
  }
  # 转换age为数字类型(pgsql必须转)
  Sql {
    plugin_input = "FieldMapper_data"
    plugin_output = "Sql_age_data"
    query = """
      SELECT
        user_name,
        sex,
        decimal_f,
        phone_number,
        CAST(age_str AS INTEGER) as age,
        create_time,
        description,
        address,
    id
      from dual
    """
  }
  
  # 2. 手机号脱敏:13812341234 -> 138****1234
  
  # 3. 年龄转换:字符串转整数(实际生产中,不用转换,也没有内置的转换插件,可以直接保存成功)

  # 4. 性别转换:1->男,2->女
  
  # 5. 数据过滤:只保留 age > 25 的记录。
  
  # 6. 地址默认值:空地址设为'未知'
}

sink {
  jdbc {
    # 接收的最终数据集
    plugin_input = "Sql_age_data"
    
    url = "jdbc:postgresql://ip:port/source_db"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    # 仅更新,必须使用更新语句了
    query = """UPDATE "public"."t_8_100w_imp_st_ds_demo6_zl_jgx_sjcjzzzd" 
    SET "user_name"=?, "sex"=?, "decimal_f"=?, "phone_number"=?, "age"=?, "create_time"=?, "description"=?, "address"=?
    WHERE "id"=?;"""
    
    # 自动生成sql的配置,和query参数互斥
    # 生成自动插入sql。如果目标库没有表,也会自动建表
    # generate_sink_sql = true
    # database必须要,因为generate_sink_sql=true。
    # database = source_db
    # 自动生成sql时,table必须要。
    # table = "public.t_8_100w_imp_st_ds_demo6_zl_jgx_sjcjzzzd"
    # 生成类似:INSERT INTO …… ON CONFLICT ("主键") DO UPDATE SET …… 的sql
    # enable_upsert = true
    # 判断值唯一的健:此选项用于支持在自动生成 SQL 时进行 insert,delete 和 update 操作。
    # primary_keys = ["id"]

    # 表结构处理策略:表不存在时报错(任务失败),一般用:CREATE_SCHEMA_WHEN_NOT_EXIST(表不存在时创建表;表存在时跳过操作(保留数据))
    schema_save_mode = "ERROR_WHEN_SCHEMA_NOT_EXIST"
    # 插入数据的处理策略
    # APPEND_DATA:保留表结构和数据,追加新数据(不删除现有数据)(一般用这个)
    # DROP_DATA:保留表结构,删除表中所有数据(清空表)——实现清空重灌
    # CUSTOM_PROCESSING :用户定义处理。需要配合:custom_sql使用
    data_save_mode = "CUSTOM_PROCESSING"
    # 当 data_save_mode 选择 CUSTOM_PROCESSING 时,您应该填写 CUSTOM_SQL 参数。此参数通常填入可执行的 SQL。SQL 将在同步任务之前执行。
    #可以实现:同步删除(执行前置update、truncate的sql等)
    #这个sql未执行,不知道为啥。
    #这个sql已经执行。原因:因为generate_sink_sql=true的原因。才会执行custom_sql。(只有自动生成sql的时候,这个才会执行)
    custom_sql = """update "source_db"."public"."t_8_100w_imp_st_ds_demo6_zl_jgx_sjcjzzzd" set "my_status" = 22;"""
    
    # 批量写入条数
    batch_size = 10000
    # 批次提交间隔
    batch_interval_ms = 500
    # 重试次数
    max_retries = 3
    
    # 连接参数
    # 连接超时时间300ms
    connection_check_timeout_sec = 300
    # 其他jdbc的参数
    properties = {
      # PostgreSQL专用参数
      # PostgreSQL的批量优化(注意大小写)
      reWriteBatchedInserts = "true"  
      # 如果需要时区设置
      options = "-c timezone=Asia/Shanghai"
    }
  }
}

posted on 2025-12-18 14:44  C_C_菜园  阅读(9)  评论(0)    收藏  举报

导航