SeaTunnel数据同步(Oracle to mysql)

因为datax2023年9月以后就没有更新,所以想找个新的且活跃的etl开源工具。

apache SeaTunnel是一个非常易用、超高性能的分布式数据集成平台,支持实时海量数据同步。 每天可稳定高效同步数百亿数据,已被近百家企业应用于生产。

直接安装体验:

export version="2.3.9"
wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"

安装插件:

sh bin/install-plugin.sh

 

lib目录放入需要的jdbc jar包:

 

mysql的emp表是提前创建好的.

create table emp 
   (    empno int, 
    ename varchar(10), 
    job varchar(9), 
    mgr int, 
    hiredate datetime, 
    sal int, 
    comm int, 
    deptno int, 
     constraint pk_emp primary key (empno)
     );

 

编辑config文件(更多的source及sink配置看官方文档,且支持CDC实时同步)

env {
  parallelism = 4
  job.mode = "BATCH"
}
source{
    Jdbc {
        url = "jdbc:oracle:thin:@10.40.12.219:1521:sharedb"
        driver = "oracle.jdbc.OracleDriver"
        user = "system"
        password = "xxxx"
        query = "SELECT * FROM scott.emp"
    }
}

transform {
    # If you would like to get more information about how to configure seatunnel                                                                                                              and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/transform-v2/sql
}

sink {
    jdbc {
        url = "jdbc:mysql://10.40.13.75:3306/ceshi?useUnicode=true&characterEnc                                                                                                             oding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        user = "root"
        password = "xxxx"
        # Automatically generate sql statements based on database table names
        generate_sink_sql = true
        database = ceshi
        table = emp
    }
}

 

cdc相关参考配置:

env {
  # You can set engine configuration here
  parallelism = 1
  job.mode = "STREAMING"
  checkpoint.interval = 5000
}

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  Oracle-CDC {
    plugin_output = "customers"
    username = "dbzuser"
    password = "dbz"
    database-names = ["ORCLCDB"]
    schema-names = ["DEBEZIUM"]
    table-names = ["ORCLCDB.DEBEZIUM.FULL_TYPES"]
    base-url = "jdbc:oracle:thin:@oracle-host:1521/ORCLCDB"
    source.reader.close.timeout = 120000
    connection.pool.size = 1
    
    schema-changes.enabled = true
  }
}

sink {
  jdbc {
    plugin_input = "customers"
    url = "jdbc:mysql://oracle-host:3306/oracle_sink"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "st_user_sink"
    password = "mysqlpw"
    generate_sink_sql = true
    # You need to configure both database and table
    database = oracle_sink
    table = oracle_cdc_2_mysql_sink_table
    primary_keys = ["ID"]
  }
}

 

开始执行:

./bin/seatunnel.sh --config oracle_to_mysql.config -m local

 

安装简单,配置简单;赞!!!

apache相关的开源产品也是吊啊!!!

参考官方文档:https://seatunnel.apache.org/zh-CN/docs/2.3.9/about

 

posted @ 2025-02-08 14:48  阿西吧li  阅读(626)  评论(0)    收藏  举报