FlinkX

FlinkX的安装与简单使用

FlinkX概述

FlinkX是在是袋鼠云内部广泛使用的基于flink的分布式离线和实时的数据同步框架,实现了多种异构数据源之间高效的数据迁移。

不同的数据源头被抽象成不同的Reader插件,不同的数据目标被抽象成不同的Writer插件。理论上,FlinkX框架可以支持任意数据源类型的数据同步工作。作为一套生态系统,每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

FlinkX是一个基于Flink的批流统一体的数据同步工具,既可以采集静态的数据,比如MySQL,HDFS等,也可以采集实时变化的数据,比如MySQL binlog,Kafka等

image-20220619223449067

在底层实现上,FlinkX依赖Flink,数据同步任务会被翻译成StreamGraph在Flink上执行.

image-20220619223533456

FlinkX的安装

https://github.com/oceanos/flinkx/blob/1.8_release/README_OLD.

1、上传并解压

unzip flinkx-1.10.zip -d /usr/local/soft/

2、配置环境变量

3、给bin/flinkx这个文件加上执行权限

chmod a+x flinkx

4、修改配置文件,设置运行端口、

vim flinkconf/flink-conf.yaml
## web服务端口,不指定的话会随机生成一个
rest.bind-port: 8888

FlinkX的简单使用

MySQLToHDFS

  • 配置文件
{
    "job": {
        "content": [
            {
                "reader": {
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/studentdb?useUnicode=true&characterEncoding=utf8&useSSL=false"
                                ],
                                "table": [
                                    "jd_goods"
                                ]
                            }
                        ],
                        "column": [
                            "*"
                        ],
                        "where": "id > 90",
                        "requestAccumulatorInterval": 2
                    },
                    "name": "mysqlreader"
                },
                "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "path": "hdfs://master:9000/bigdata30/flinkx/out1",
                        "defaultFS": "hdfs://master:9000",
                        "column": [
                            {
                                "name": "col1",
                                "index": 0,
                                "type": "string"
                            },{
                                "name": "col2",
                                "index": 1,
                                "type": "string"
                            },{
                                "name": "col3",
                                "index": 2,
                                "type": "string"
                            },{
                                "name": "col4",
                                "index": 3,
                                "type": "string"
                            },{
                                "name": "col1",
                                "index": 4,
                                "type": "string"
                            },{
                                "name": "col2",
                                "index": 5,
                                "type": "string"
                            }
                        ],
                        "fieldDelimiter": ",",
                        "fileType": "text",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
            "restore": {
                "isRestore": false,
                "isStream": false
            },
            "errorLimit": {},
            "speed": {
                "channel": 1
            }
        }
    }
}
  • 启动任务
flinkx -mode local -job ./mysql2hdfs.json -pluginRoot /usr/local/soft/flinkx-1.10/syncplugins/
  • 监听日志

flinkx 任务启动后,会在执行命令的目录下生成一个nohup.out文件

tail -f nohup.out
  • 通过web界面查看任务运行情况
http://master:8888

MySQLToHive

  • 配置文件
{
    "job": {
        "content": [
            {
                "reader": {
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/studentdb?useUnicode=true&characterEncoding=utf8&useSSL=false"
                                ],
                                "table": [
                                    "jd_goods"
                                ]
                            }
                        ],
                        "column": [
                            "*"
                        ],
                        "where": "id > 90",
                        "requestAccumulatorInterval": 2
                    },
                    "name": "mysqlreader"
                },
                "writer": {
                    "name": "hivewriter",
                    "parameter": {
                        "jdbcUrl": "jdbc:hive2://master:10000/bigdata30",
                        "username": "",
                        "password": "",
                        "fileType": "text",
                        "fieldDelimiter": ",",
                        "writeMode": "overwrite",
                        "compress": "",
                        "charsetName": "UTF-8",
                        "maxFileSize": 1073741824,
                        "tablesColumn": "{\"jd_goods\":[{\"key\":\"id\",\"type\":\"string\"},{\"key\":\"gname\",\"type\":\"string\"},{\"key\":\"price\",\"type\":\"bigint\"},{\"key\":\"commit\",\"type\":\"string\"},{\"key\":\"shop\",\"type\":\"string\"},{\"key\":\"icons\",\"type\":\"string\"}]}",
                        "defaultFS": "hdfs://master:9000"
                    }
                }
            }
        ],
        "setting": {
            "restore": {
                "isRestore": false,
                "isStream": false
            },
            "errorLimit": {},
            "speed": {
                "channel": 1
            }
        }
    }
}
  • 在hive中创建testflinkx数据库,并创建student分区表
create database testflinkx;
use testflinkx;
CREATE TABLE `bigdata30`.`datax_tb1`(
    `id` STRING,
    `gname` STRING,
    `price` STRING,
    `commit` STRING,
    `shop` STRING,
    `icons` STRING)
PARTITIONED BY ( 
  `pt` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',';
  • 启动hiveserver2
# 第一种方式:
hiveserver2
# 第二种方式:
hive --service hiveserver2
  • 启动任务
flinkx -mode local -job ./mysql2hive.json -pluginRoot /usr/local/soft/flinkx-1.10/syncplugins/ -flinkconf /usr/local/soft/flinkx-1.10/flinkconf/
  • 查看日志及运行情况同上

MySQLToHBase

  • 配置文件
{
    "job": {
        "content": [
            {
                "reader": {
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://master:3306/studentdb?useUnicode=true&characterEncoding=utf8&useSSL=false"
                                ],
                                "table": [
                                    "jd_goods"
                                ]
                            }
                        ],
                        "column": [
                            "*"
                        ],
                        "where": "id > 90",
                        "requestAccumulatorInterval": 2
                    },
                    "name": "mysqlreader"
                },
                "writer": {
                    "name": "hbasewriter",
                    "parameter": {
                        "hbaseConfig": {
                            "hbase.zookeeper.property.clientPort": "2181",
                            "hbase.rootdir": "hdfs://master:9000/hbase",
                            "hbase.cluster.distributed": "true",
                            "hbase.zookeeper.quorum": "master,node1,node2",
                            "zookeeper.znode.parent": "/hbase"
                        },
                        "table": "flinkx_jd_goods",
                        "rowkeyColumn": "$(cf1:id)",
                        "column": [
                            {
                                "name": "cf1:id",
                                "type": "string"
                            },
                            {
                                "name": "cf1:gname",
                                "type": "string"
                            },
                            {
                                "name": "cf1:price",
                                "type": "string"
                            },
                            {
                                "name": "cf1:commit",
                                "type": "string"
                            },
                            {
                                "name": "cf1:shop",
                                "type": "string"
                            },
                            {
                                "name": "cf1:icons",
                                "type": "string"
                            }
                        ]
                    }
                }
            }
        ],
        "setting": {
            "restore": {
                "isRestore": false,
                "isStream": false
            },
            "errorLimit": {},
            "speed": {
                "channel": 1
            }
        }
    }
}
  • 启动hbase 并创建testflinkx表
create 'testFlinkx','cf1'
  • 启动任务
flinkx -mode local -job ./mysql2hbase.json -pluginRoot /usr/local/soft/flinkx-1.10/syncplugins/ -flinkconf /usr/local/soft/flinkx-1.10/flinkconf/
  • 查看日志及运行情况同上

MySQLToMySQL

配置文件

{
    "job": {
      "content": [
        {
          "reader": {
            "name": "mysqlreader",
            "parameter": {
              "column": [
                {
                  "name": "id",
                  "type": "int"
                },
                {
                  "name": "name",
                  "type": "string"
                },
                {
                  "name": "age",
                  "type": "int"
                },
                {
                  "name": "gender",
                  "type": "string"
                },
                {
                  "name": "clazz",
                  "type": "string"
                }
              ],
              "username": "root",
              "password": "123456",
              "connection": [
                {
                  "jdbcUrl": [
                    "jdbc:mysql://master:3306/student?useSSL=false"
                  ],
                  "table": [
                    "student"
                  ]
                }
              ]
            }
          },
          "writer": {
            "name": "mysqlwriter",
            "parameter": {
              "username": "root",
              "password": "123456",
              "connection": [
                {
                  "jdbcUrl": "jdbc:mysql://master:3306/student?useSSL=false",
                  "table": [
                    "student2"
                  ]
                }
              ],
              "writeMode": "insert",
              "column": [
                {
                    "name": "id",
                    "type": "int"
                  },
                  {
                    "name": "name",
                    "type": "string"
                  },
                  {
                    "name": "age",
                    "type": "int"
                  },
                  {
                    "name": "gender",
                    "type": "string"
                  },
                  {
                    "name": "clazz",
                    "type": "string"
                  }
              ]
            }
          }
        }
      ],
      "setting": {
        "speed": {
          "channel": 1,
          "bytes": 0
        }
      }
    }
  }
posted @ 2024-07-10 19:59  Bigdata_Yuan  阅读(42)  评论(0)    收藏  举报