Fork me on GitHub

StreamSets Data Collector 实时同步 PostgreSQL/MySQL 数据到 Clickhouse

Streamsets 是一款大数据实时采集和 ETL 工具,可以实现不写一行代码完成数据的采集和流转。通过拖拽式的可视化界面,实现数据管道(Pipelines)的设计和定时任务调度。最大的特点有:
- 可视化界面操作,不写代码完成数据的采集和流转,在几分钟内设计用于流式传输、批处理和更改数据捕获 (CDC) 的管道
- 内置监控,可是实时查看数据流传输的基本信息和数据的质量
- 强大的整合力,对现有常用组件全力支持,包括50种数据源、44 种数据操作、46 种目的地。

对于Streamsets来说,最重要的概念就是数据源(Origins)、操作(Processors)、目的地(Destinations)。创建一个Pipelines管道配置也基本是这三个方面。常见的 Origins 有 Kafka、HTTP、UDP、JDBC、HDFS 等;Processors 可以实现对每个字段的过滤、更改、编码、聚合等操作;Destinations 跟 Origins 差不多,可以写入 Kafka、Flume、JDBC、HDFS、Redis 等。

Origins(读取的目标数据源)(文档: https://docs.streamsets.com/portal/#datacollector/3.22.x/help/datacollector/UserGuide/Origins/Origins_title.html)

Destinations(写入的数据源)

Streamsets 安装

# 关闭防火墙:
systemctl stop firewalld.service
# 关闭SELINUX:
setenforce 0 (临时生效)  
修改 /etc/selinux/config 下的 SELINUX=disabled (重启后永久生效)
# 配置文件上限
ulimit -n
vi /etc/security/limits.conf
sdc soft nofile    32768
sdc hard nofile    32768
vi /etc/profile
ulimit -HSn 32768
source /etc/profile
# mac OSX 下修改 ulimit 参数
sudo sysctl -w kern.maxfilesperproc=65535
ulimit -n 65535
# 安装 axel
brew install axel
wget http://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/a/axel-2.4-9.el7.x86_64.rpm
rpm -ivh axel-2.4-9.el7.x86_64.rpm
# 下载 streamsets-datacollector
axel -n 20 -a -v https://archives.streamsets.com/datacollector/3.22.3/tarball/activation/streamsets-datacollector-common-3.22.3.tgz
axel -n 20 -a -v https://archives.streamsets.com/datacollector/3.22.3/tarball/activation/streamsets-datacollector-all-3.22.3.tgz(不推荐,全组件)
# 安装 jdk11
sudo yum install java-11-openjdk-devel
scp /Users/irving/Downloads/streamsets-datacollector-common-3.22.3.tgz root@10.34.12.255:/opt

# 启动服务
tar xvzf streamsets-datacollector-common-3.22.3.tgz
nohup bin/streamsets dc &

# pm2 管理
pm2 start "bin/streamsets dc" --name streamsets-datacollector
pm2 startup
pm2 save

# 修改配置文件(启用 form 认证,非默认的 OAUTH2 认证)
vi sdc.properties
# The authentication for the HTTP endpoint of Data Collector
# Valid values are: 'none', 'basic', 'digest', 'form' or 'aster'
http.authentication=form
pm2 logs streamsets-datacollector
0|streamse | Java 11 detected; adding $SDC_JAVA11_OPTS of "-Djdk.nio.maxCachedBufferSize=262144" to $SDC_JAVA_OPTS
0|streamse | Activation enabled, activation is valid and it does not expire
0|streamse | Logging initialized @3119ms to org.eclipse.jetty.util.log.Slf4jLog
0|streamsets-datacollector  | Running on URI : 'http://:18630'

# 上传 JDBC 驱动 (复制到 /opt/streamsets-datacollector-3.22.3/streamsets-libs-extras/streamsets-datacollector-jdbc-lib/lib 文件夹)
mvn dependency:copy-dependencies -DoutputDirectory=lib -DincludeScope=compile
scp -r /Users/irving/Desktop/git/microsrv/clickhouse/lib root@10.34.12.255:/opt/streamsets-datacollector-3.22.3/streamsets-libs-extras/streamsets-datacollector-jdbc-lib/lib
pm2 restart streamsets-datacollector

 Maven 打包上传 JDBC 驱动包(MSSQL 与 PostgreSQL 默认支持)

 <dependencies>
        <dependency>
            <groupId>ru.yandex.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>0.3.1</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.25</version>
        </dependency>
        <!--        <dependency>-->
        <!--            <groupId>org.postgresql</groupId>-->
        <!--            <artifactId>postgresql</artifactId>-->
        <!--            <version>42.2.23</version>-->
        <!--        </dependency>-->
    </dependencies>
# tree -h
.
├── [ 352K]  clickhouse-jdbc-0.3.1.jar
├── [ 327K]  commons-codec-1.11.jar
├── [  60K]  commons-logging-1.2.jar
├── [ 762K]  httpclient-4.5.13.jar
├── [ 321K]  httpcore-4.4.13.jar
├── [  41K]  httpmime-4.5.13.jar
├── [  65K]  jackson-annotations-2.9.10.jar
├── [ 318K]  jackson-core-2.9.10.jar
├── [ 1.3M]  jackson-databind-2.9.10.8.jar
├── [ 635K]  lz4-java-1.7.1.jar
├── [ 2.3M]  mysql-connector-java-8.0.25.jar
├── [ 1.6M]  protobuf-java-3.11.4.jar
└── [  40K]  slf4j-api-1.7.30.jar

Clickhouse 安装

# 检查环境
grep -q sse4_2 /proc/cpuinfo && echo “SSE 4.2 supported” || echo “SSE 4.2 not supported.
“SSE 4.2 supported” 代表可以安装,ClickHouse 需要使用 SSE 硬件指令集加速,大大加快了 CPU 寄存器计算效率。
# 安装 clickhouse
sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client

[root@ ~]# sudo yum install clickhouse-server clickhouse-client
Repository epel is listed more than once in the configuration
上次元数据过期检查:0:01:02 前,执行于 2021年07月15日 星期四 15时49分34秒。
依赖关系解决。
============================================================================================================================================================================================================
软件包                                                    架构                                    版本                                            仓库                                                大小
============================================================================================================================================================================================================
安装:
clickhouse-client                                         noarch                                  21.7.3.14-2                                     clickhouse-stable                                   76 k
clickhouse-server                                         noarch                                  21.7.3.14-2                                     clickhouse-stable                                  100 k
安装依赖关系:
clickhouse-common-static                                  x86_64                                  21.7.3.14-2                                     clickhouse-stable                                  166 M

事务概要
============================================================================================================================================================================================================
安装  3 软件包

总下载:166 M
安装大小:539 M
确定吗?[y/N]: y
下载软件包:
[MIRROR] clickhouse-client-21.7.3.14-2.noarch.rpm: Curl error (56): Failure when receiving data from the peer for http://repo.clickhouse.tech/rpm/stable/x86_64/clickhouse-client-21.7.3.14-2.noarch.rpm [Recv failure: 连接被对方重设]
(1/3): clickhouse-server-21.7.3.14-2.noarch.rpm                                                                                                                             142 kB/s | 100 kB     00:00
(2/3): clickhouse-client-21.7.3.14-2.noarch.rpm                                                                                                                              99 kB/s |  76 kB     00:00
(3/3): clickhouse-common-static-21.7.3.14-2.x86_64.rpm                                                                                                                      5.3 MB/s | 166 MB     00:31
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                                                        5.3 MB/s | 166 MB     00:31
运行事务检查
事务检查成功。
运行事务测试
事务测试成功。
运行事务
准备中  :                                                                                                                                                                                             1/1
安装    : clickhouse-common-static-21.7.3.14-2.x86_64                                                                                                                                                 1/3
安装    : clickhouse-client-21.7.3.14-2.noarch                                                                                                                                                        2/3
安装    : clickhouse-server-21.7.3.14-2.noarch                                                                                                                                                        3/3
运行脚本: clickhouse-server-21.7.3.14-2.noarch                                                                                                                                                        3/3
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-client already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-local already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-benchmark already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-copier already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-obfuscator already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-compressor already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-format already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
groupadd -r clickhouse
Creating clickhouse user if it does not exist.
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
chown --recursive clickhouse:clickhouse '/etc/clickhouse-server'
Creating log directory /var/log/clickhouse-server/.
Creating data directory /var/lib/clickhouse/.
Creating pid directory /var/run/clickhouse-server.
chown --recursive clickhouse:clickhouse '/var/log/clickhouse-server/'
chown --recursive clickhouse:clickhouse '/var/run/clickhouse-server'
chown clickhouse:clickhouse '/var/lib/clickhouse/'
groupadd -r clickhouse-bridge
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse-bridge clickhouse-bridge
chown --recursive clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-odbc-bridge'
chown --recursive clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-library-bridge'
Password for default user is empty string. See /etc/clickhouse-server/users.xml and /etc/clickhouse-server/users.d to change it.
Setting capabilities for clickhouse binary. This is optional.

ClickHouse has been successfully installed.

Start clickhouse-server with:
sudo clickhouse start

Start clickhouse-client with:
clickhouse-client

Synchronizing state of clickhouse-server.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable clickhouse-server
Created symlink /etc/systemd/system/multi-user.target.wants/clickhouse-server.service → /etc/systemd/system/clickhouse-server.service.

验证    : clickhouse-client-21.7.3.14-2.noarch                                                                                                                                                        1/3
验证    : clickhouse-common-static-21.7.3.14-2.x86_64                                                                                                                                                 2/3
验证    : clickhouse-server-21.7.3.14-2.noarch                                                                                                                                                        3/3

已安装:
clickhouse-client-21.7.3.14-2.noarch                             clickhouse-common-static-21.7.3.14-2.x86_64                             clickhouse-server-21.7.3.14-2.noarch

完毕!

# 查看版本
clickhouse-server --version
ClickHouse server version 21.7.3.14 (official build).
# 启动服务
systemctl start clickhouse-server.service
# 重启服务
systemctl restart clickhouse-server.service
# 停止服务
systemctl stop clickhouse-server.service
# 查看状态
systemctl status clickhouse-server.service

```
[root@ clickhouse-server]# systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/etc/systemd/system/clickhouse-server.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-07-15 17:28:31 CST; 1min 11s ago
Main PID: 508117 (clckhouse-watch)
Tasks: 156 (limit: 49524)
Memory: 127.5M
CGroup: /system.slice/clickhouse-server.service
├─508117 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
└─508118 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid

7月 15 17:28:31  systemd[1]: Started ClickHouse Server (analytic DBMS for big data).
7月 15 17:28:31  clickhouse-server[508117]: Processing configuration file '/etc/clickhouse-server/config.xml'.
7月 15 17:28:31  clickhouse-server[508117]: Logging trace to /var/log/clickhouse-server/clickhouse-server.log
7月 15 17:28:31  clickhouse-server[508117]: Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
7月 15 17:28:32  clickhouse-server[508117]: Processing configuration file '/etc/clickhouse-server/config.xml'.
7月 15 17:28:32  clickhouse-server[508117]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
7月 15 17:28:32  clickhouse-server[508117]: Processing configuration file '/etc/clickhouse-server/users.xml'.
7月 15 17:28:32  clickhouse-server[508117]: Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
```
# 目录
```
(1)/etc/clickhouse-server:服务端的配置文件目录,包括全局配置 config.xml 和用户配置 users.xml 等。
(2)/var/lib/clickhouse:默认的数据存储目录(通常会修改默认路径配置,将数据保存到大容量磁盘挂载的路径)。
(3)/var/log/clickhouse-server:默认保存日志的目录(通常会修改路径配置,将日志保存到大容量磁盘挂载的路径)。

Before going further, please notice the <path> element in config.xml. Path determines the location for data storage, so it should be located on volume with large disk capacity; the default value is /var/lib/clickhouse/.
```
# 配置远程访问权限
chmod 777 config.xml users.xml
<!-- 如果禁用了ipv6,使用下面配置-->
<listen_host>0.0.0.0</listen_host>
<!-- 如果没有禁用ipv6,使用下面配置
<listen_host>::</listen_host>
-->
```
1.纯文本:
<password>password</password>
2.sha256:
<password_sha256_hex>password</password_sha256_hex>
从shell生成密码的示例:
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
第一行明文,第二行sha256
3.sha1:
<password_double_sha1_hex>password</password_double_sha1_hex>
从shell生成密码的示例:
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
第一行明文,第二行sha1
```
# 生成密码
[root@ clickhouse-server]# PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
JUxG8EwNS
b1854b52ed2577d50a23591d0d8fda3420b4d2af
# 配置
<password_double_sha1_hex>b1854b52ed2577d50a23591d0d8fda3420b4d2af</password_double_sha1_hex>
# 客户端登陆(http端口是8123,tcp端口是9000)
clickhouse-client -h 127.0.0.1 --port 9000 -m -u default --password JUxG8EwNS
# 导入数据(航班数据为样例测试)
curl -O https://datasets.clickhouse.tech/ontime/partitions/ontime.tar
# 启动20个线程下载并显示进度条
axel -n 10 -a https://datasets.clickhouse.tech/ontime/partitions/ontime.tar
tar xvf ontime.tar -C /var/lib/clickhouse
# 查看目录文件大小
du -h --max-depth=1
# 重启服务
sudo systemctl restart clickhouse-server.service
# 查看总条数(1839537321.8亿+)
clickhouse-client -h 127.0.0.1 --port 9000 -m -u default --password JUxG8EwNS --query "select count(*) from datasets.ontime"
# 建测试表
CREATE TABLE emp_replacingmergetree (
  emp_id UInt16 COMMENT '员工id',
  name String COMMENT '员工姓名',
  work_place String COMMENT '工作地点',
  age UInt8 COMMENT '员工年龄',
  depart String COMMENT '部门',
  salary Decimal32(2) COMMENT '薪资'
  )ENGINE=ReplacingMergeTree()
  ORDER BY emp_id
  PRIMARY KEY emp_id
  PARTITION BY work_place
  ;
 -- 插入数据 
INSERT INTO emp_replacingmergetree
VALUES (1,'tom','上海',25,'技术部',20000),(2,'jack','上海',26,'人事部',10000);
INSERT INTO emp_replacingmergetree
VALUES (3,'bob','北京',33,'财务部',50000),(4,'tony','杭州',28,'销售事部',50000);
# 分区表
按时间分区:
toYYYYMM(EventDate):按月分区
toMonday(EventDate):按周分区
toDate(EventDate):按天分区
按指定列分区:
PARTITION BY cloumn_name

示意图

MaterializeMySQL 与 MaterializedPostgreSQL 引擎

https://clickhouse.tech/docs/en/engines/database-engines/materialize-mysql/(这个特性是实验性的)
https://clickhouse.tech/docs/en/engines/database-engines/materialized-postgresql/(官方文档未说是实验性的,未测试)

手工建表航班测试数据

CREATE TABLE `ontime`
(
    `Year`                            UInt16,
    `Quarter`                         UInt8,
    `Month`                           UInt8,
    `DayofMonth`                      UInt8,
    `DayOfWeek`                       UInt8,
    `FlightDate`                      Date,
    `Reporting_Airline`               String,
    `DOT_ID_Reporting_Airline`        Int32,
    `IATA_CODE_Reporting_Airline`     String,
    `Tail_Number`                     Int32,
    `Flight_Number_Reporting_Airline` String,
    `OriginAirportID`                 Int32,
    `OriginAirportSeqID`              Int32,
    `OriginCityMarketID`              Int32,
    `Origin`                          FixedString(5),
    `OriginCityName`                  String,
    `OriginState`                     FixedString(2),
    `OriginStateFips`                 String,
    `OriginStateName`                 String,
    `OriginWac`                       Int32,
    `DestAirportID`                   Int32,
    `DestAirportSeqID`                Int32,
    `DestCityMarketID`                Int32,
    `Dest`                            FixedString(5),
    `DestCityName`                    String,
    `DestState`                       FixedString(2),
    `DestStateFips`                   String,
    `DestStateName`                   String,
    `DestWac`                         Int32,
    `CRSDepTime`                      Int32,
    `DepTime`                         Int32,
    `DepDelay`                        Int32,
    `DepDelayMinutes`                 Int32,
    `DepDel15`                        Int32,
    `DepartureDelayGroups`            String,
    `DepTimeBlk`                      String,
    `TaxiOut`                         Int32,
    `WheelsOff`                       Int32,
    `WheelsOn`                        Int32,
    `TaxiIn`                          Int32,
    `CRSArrTime`                      Int32,
    `ArrTime`                         Int32,
    `ArrDelay`                        Int32,
    `ArrDelayMinutes`                 Int32,
    `ArrDel15`                        Int32,
    `ArrivalDelayGroups`              Int32,
    `ArrTimeBlk`                      String,
    `Cancelled`                       UInt8,
    `CancellationCode`                FixedString(1),
    `Diverted`                        UInt8,
    `CRSElapsedTime`                  Int32,
    `ActualElapsedTime`               Int32,
    `AirTime`                         Nullable(Int32),
    `Flights`                         Int32,
    `Distance`                        Int32,
    `DistanceGroup`                   UInt8,
    `CarrierDelay`                    Int32,
    `WeatherDelay`                    Int32,
    `NASDelay`                        Int32,
    `SecurityDelay`                   Int32,
    `LateAircraftDelay`               Int32,
    `FirstDepTime`                    String,
    `TotalAddGTime`                   String,
    `LongestAddGTime`                 String,
    `DivAirportLandings`              String,
    `DivReachedDest`                  String,
    `DivActualElapsedTime`            String,
    `DivArrDelay`                     String,
    `DivDistance`                     String,
    `Div1Airport`                     String,
    `Div1AirportID`                   Int32,
    `Div1AirportSeqID`                Int32,
    `Div1WheelsOn`                    String,
    `Div1TotalGTime`                  String,
    `Div1LongestGTime`                String,
    `Div1WheelsOff`                   String,
    `Div1TailNum`                     String,
    `Div2Airport`                     String,
    `Div2AirportID`                   Int32,
    `Div2AirportSeqID`                Int32,
    `Div2WheelsOn`                    String,
    `Div2TotalGTime`                  String,
    `Div2LongestGTime`                String,
    `Div2WheelsOff`                   String,
    `Div2TailNum`                     String,
    `Div3Airport`                     String,
    `Div3AirportID`                   Int32,
    `Div3AirportSeqID`                Int32,
    `Div3WheelsOn`                    String,
    `Div3TotalGTime`                  String,
    `Div3LongestGTime`                String,
    `Div3WheelsOff`                   String,
    `Div3TailNum`                     String,
    `Div4Airport`                     String,
    `Div4AirportID`                   Int32,
    `Div4AirportSeqID`                Int32,
    `Div4WheelsOn`                    String,
    `Div4TotalGTime`                  String,
    `Div4LongestGTime`                String,
    `Div4WheelsOff`                   String,
    `Div4TailNum`                     String,
    `Div5Airport`                     String,
    `Div5AirportID`                   Int32,
    `Div5AirportSeqID`                Int32,
    `Div5WheelsOn`                    String,
    `Div5TotalGTime`                  String,
    `Div5LongestGTime`                String,
    `Div5WheelsOff`                   String,
    `Div5TailNum`                     String
) ENGINE MergeTree () 
PARTITION BY FlightDate 
ORDER BY (Year,FlightDate) 
SETTINGS index_granularity = 8192;

备注:如果对实时性要求比较高使用 CDC(Change Data Capture) 或许也是一个比较好的方案, MySQL 需要开启 binlog 支持,PostgreSQL 需要开启 Write-Ahead Logging (WAL) 支持;debezium 或 flink cdc connector (暂未支持 Clickhouse)也是不错的备选方案。

REFER:
https://docs.streamsets.com/portal/#datacollector/latest/help/index.html
http://streamsets.vip/
https://clickhouse.tech/docs/en/
https://streamsets.com/products/dataops-platform/data-collector/
https://accounts.streamsets.com/archives(全组件 6GB+ )
https://github.com/ververica/flink-cdc-connectors
https://github.com/Altinity/clickhouse-mysql-data-reader
https://github.com/debezium/debezium
https://juejin.cn/post/6860480798294966286
https://www.cnblogs.com/uestc2007/p/13704912.html

https://github.com/apache/nifi(Apache NiFi 与 StreamSets 都是 Dataflow 数据处理的工具。虽然 StreamSets 图形化很美,但是企业级的 StreamSets Control Hub 不是开源的,论开源方面的企业级功能完整度,Apache NiFi 更全面。)

https://github.com/ClickHouse/ClickHouse

https://clickhouse.com/docs/zh/interfaces/jdbc/(TCP JDBC)

Build a CDC Pipeline

https://www.youtube.com/watch?v=rf0fquEze9s&ab_channel=StreamSetsInc.

How to Create Apache Kafka Pipelines in Minutes

https://www.youtube.com/watch?v=SiZrkyEzpJc

posted @ 2021-07-17 14:10  花儿笑弯了腰  阅读(2978)  评论(0编辑  收藏  举报