DataSophon1.2.1集成DataX&DataX-Web(多节点)

DataSophon简单集成DataX&DataX-Web(多节点)

DATAX部署

环境准备

  • JDK(1.8以上,推荐1.8)
  • Python(2或3都可以,linux自带py2,py3执行脚本会报错,需要修改脚本)
  • Apache Maven 3.x (Compile DataX,如果下载的是官方的压缩包[datax.tar.gz],不用安装这个,如果是在git拉的项目,打包时需要)

安装包编译

方法一、直接下载DataX工具包:DataX下载地址

下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}

自检脚本:
   python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

方法二、下载DataX源码,自己编译:DataX源码

(1)、下载DataX源码:

如果没有将服务器的密钥配置到Github,使用Git克隆源表的时候会拉取失败,可以考虑直接将下载zip文件上传到服务器

# git clone https://github.com/alibaba/DataX.git
git clone git@github.com:alibaba/DataX.git

(2)、通过maven打包:

$ cd  {DataX_source_code_home}
$ mvn -U clean package assembly:assembly -Dmaven.test.skip=true

打包成功,日志显示如下:

该过程大约20~30分钟。具体快慢看当前的网速如何

[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------

打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

$ cd  {DataX_source_code_home}
$ ls ./target/datax/datax/
bin		conf		job		lib		log		log_perf	plugin		script		tmp

上传DataX安装包

cd /opt/datasophon/DDP/packages/

md5sum datax.tar.gz | awk '{print $1}' > datax.tar.gz.md5

准备配置文件service_ddl.json

进入datasophon-manager-1.2.1中

mkdir -p /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAX

vi /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAX/service_ddl.json
{
  "name": "Datax",
  "label": "Datax",
  "description": "离线数据同步工具",
  "version": "1.0.0",
  "sortNum": 21,
  "dependencies":[],
  "packageName": "datax.tar.gz",
  "decompressPackageName": "datax",
  "runAs":"root",
  "roles": [
    {
      "name": "DataxClient",
      "label": "DataxClient",
      "roleType": "client",
      "cardinality": "1+",
      "logFile": ""
    }
  ],
  "configWriter": {
    "generators": []
  },
  "parameters": []
}

重启datasophon-manager的api

sh /opt/datasophon/datasophon-manager-1.2.1/bin/datasophon-api.sh restart api

安装DataX

image-20241130200944056

添加DataX服务。

image-20241130201139931

直接下一步。

image-20241130201204745

选择安装DataX的工作节点。

image-20241130201230573

直接下一步,不需要进行任何配置。

image-20241130201425046

image-20241130201530579

DataSophon集成DataX成功。

注意:其实DataSophon集成DataX本质就是将安装包进行解压到指定节点集群,为了就是后续集成DataX-Web的时候进行调用。

DATAX-WEB部署

环境准备

  • 1.MySQL (5.5+) 必选,对应客户端可以选装, Linux服务上若安装mysql的客户端可以通过部署脚本快速初始化数据库
  • 2.JDK (1.8.0_xxx) 必选(本次部署java版本为1.8.0_131)
  • 3.Maven (3.6.1+) 必选(本次部署maven版本为3.6.3)
  • 4.DataX 必选 (本次部署DataX 版本为3.0)
  • 5.Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造 (本次部署Python版本为2.7.5)

编译打包

  • 克隆

    git clone https://github.com/WeiYe-Jing/datax-web.git
    
  • 直接从Git上面获得源代码,在项目的根目录下执行如下命令

    mvn clean install 
    
  • 执行成功后将会在工程的build目录下生成安装包

    build/datax-web-{VERSION}.tar.gz
    

获取安装包

  1. 解压安装包

    在选定的安装目录,解压安装包

    # tar -zxvf datax-web-{VERSION}.tar.gz
    tar -zxvf datax-web-2.1.2.tar.gz
    
  2. 执行一键安装脚本

    进入解压后的目录,找到bin目录下面的install.sh文件,如果选择交互式的安装,则直接执行

    ./bin/install.sh
    

    在交互模式下,对各个模块的package压缩包的解压以及configure配置脚本的调用,都会请求用户确认,可根据提示查看是否安装成功,如果没有安装成功,可以重复尝试; 如果不想使用交互模式,跳过确认过程,则执行以下命令安装

    ./bin/install.sh --force
    

修改executor中的application.yml【可选】

注意:如果需要部署一个admin,多个executor的情况。可以修改executor下的application.yml,将datax.job.admin指向该admin的节点,否则可以保持不变。

vi datax-web-2.1.2/modules/datax-executor/conf/application.yml

datax:
  job:
    admin:
      ### datax admin address list, such as "http://address" or "http://address01,http://address02"
      #addresses: http://127.0.0.1:8080
      addresses: http://192.168.10.23:${datax.admin.port}

准备配置status状态脚本

admin-status.sh

vi datax-web-2.1.2/bin/admin-status.sh

#!/bin/bash

# 定义服务名称
DATAX_ADMIN_SERVICE="datax-admin"

# 获取进程ID函数
get_pid() {
    local service_name=$1
    # 查找正在运行的服务的进程ID
    pid=$(ps -ef | grep -v grep | grep "$service_name" | awk '{print $2}')
    echo "$pid"
}

# 检查服务是否在运行
check_status() {
    local service_name=$1
    pid=$(get_pid "$service_name")
    
    if [ -z "$pid" ]; then
        echo "$service_name is NOT running."
        exit 1
    else
        echo "$service_name is running with PID $pid."
        exit 0
    fi
}

# 检查 datax-admin 状态
check_status "$DATAX_ADMIN_SERVICE"

executor-status.sh

vi datax-web-2.1.2/bin/executor-status.sh

#!/bin/bash

# 定义服务名称
DATAX_EXECUTOR_SERVICE="datax-executor"

# 获取进程ID函数
get_pid() {
    local service_name=$1
    # 查找正在运行的服务的进程ID
    pid=$(ps -ef | grep -v grep | grep "$service_name" | awk '{print $2}')
    echo "$pid"
}

# 检查服务是否在运行
check_status() {
    local service_name=$1
    pid=$(get_pid "$service_name")
    
    if [ -z "$pid" ]; then
        echo "$service_name is NOT running."
        exit 1
    else
        echo "$service_name is running with PID $pid."
        exit 0
    fi
}


# 检查 datax-executor 状态
check_status "$DATAX_EXECUTOR_SERVICE"

重新编译安装包

cp -r datax-web-2.1.2 /opt/datasophon/DDP/packages/

cd /opt/datasophon/DDP/packages/
#拷贝初始化SQL出来备用
cp -r datax-web-2.1.2/db/datax_web.sql /opt/datasophon/DDP/packages/

tar -czf datax-web-2.1.2.tar.gz datax-web-2.1.2
md5sum datax-web-2.1.2.tar.gz | awk '{print $1}' >datax-web-2.1.2.tar.gz.md5

准备配置文件service_ddl.json

进入datasophon-manager-1.2.1中

mkdir -p /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAXWEB

vi /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/DATAXWEB/service_ddl.json
{
  "name": "DATAXWEB",
  "label": "DataxWeb",
  "description": "DATAX离线数据可视化同步工具",
  "version": "2.1.2",
  "sortNum": 22,
  "dependencies": [],
  "packageName": "datax-web-2.1.2.tar.gz",
  "decompressPackageName": "datax-web-2.1.2",
  "roles": [{
  	"name": "DataxAdmin",
  	"label": "DataxAdmin",
  	"roleType": "master",
  	"runAs": {
  		"user": "root",
  		"group": "root"
  	},
  	"cardinality": "1",
  	"sortNum": 22,
	"logFile": "/opt/datasophon/datax-web/modules/datax-admin/logs/datax-admin.log",
  	"jmxPort": 2192,
  	"startRunner": {
  		"timeout": "60",
  		"program": "modules/datax-admin/bin/datax-admin.sh",
  		"args": ["start"]
  	},
  	"stopRunner": {
  		"timeout": "600",
  		"program": "modules/datax-admin/bin/datax-admin.sh",
  		"args": ["stop"]
  	},
  	"statusRunner": {
        "timeout": "60",
        "program": "bin/admin-status.sh",
        "args": []
    },
	"externalLink": {
		"name": "DataxWebUi",
		"label": "DataxWebUi",
		"url": "http://${host}:9527/index.html"
	}
  },{
	"name": "DataxExecutor",
	"label": "DataxExecutor",
	"roleType": "worker",
	"runAs": {
		"user": "root",
		"group": "root"
	},
	"cardinality": "1+",
	"sortNum": 3,
	"logFile": "/opt/datasophon/datax-web/modules/datax-executor/logs/datax-executor.log",
	"jmxPort": 2192,
	"startRunner": {
		"timeout": "60",
		"program": "modules/datax-executor/bin/datax-executor.sh",
		"args": ["start"]
	},
	"stopRunner": {
		"timeout": "600",
		"program": "modules/datax-executor/bin/datax-executor.sh",
		"args": ["stop"]
	},
  	"statusRunner": {
        "timeout": "60",
        "program": "bin/executor-status.sh",
        "args": []
    }
  }],
  "configWriter": {
  	"generators": [{
  		"filename": "bootstrap.properties",
  		"configFormat": "custom",
  		"outputDirectory": "modules/datax-admin/conf",
  		"templateName": "bootstrap-properties.ftl",
  		"includeParams": ["DB_HOST", "DB_PORT", "DB_USERNAME", "DB_PASSWORD", "DB_DATABASE"]
  	}, {
  		"filename": "datax-admin/bin/env.properties",
  		"configFormat": "custom",
  		"outputDirectory": "modules",
  		"templateName": "datax-admin-env-properties.ftl",
  		"includeParams": ["MAIL_USERNAME", "MAIL_PASSWORD"]
  	},{
  		"filename": "datax-executor/bin/env.properties",
  		"configFormat": "custom",
  		"outputDirectory": "modules",
  		"templateName": "datax-executor-env-properties.ftl",
  		"includeParams": ["PYTHON_PATH"]
  	}]
  },
  "parameters": [{
  	"name": "DB_HOST",
  	"label": "DATAXWEB数据库的主机名或IP地址",
  	"description": "DATAXWEB数据库的主机名或IP地址",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "192.168.10.21",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "localhost"
  }, {
  	"name": "DB_PORT",
  	"label": "DATAXWEB数据库监听的端口号",
  	"description": "DATAXWEB数据库监听的端口号",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "3306",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "3306"
  }, {
  	"name": "DB_USERNAME",
  	"label": "用于连接DATAXWEB数据库的用户名",
  	"description": "用于连接DATAXWEB数据库的用户名",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "root",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "root"
  }, {
  	"name": "DB_PASSWORD",
  	"label": "用于连接DATAXWEB数据库的密码",
  	"description": "用于连接DATAXWEB数据库的密码",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "yixiao666",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "123456"
  }, {
  	"name": "DB_DATABASE",
  	"label": "用于连接DATAXWEB数据库的库名",
  	"description": "用于连接DATAXWEB数据库的库名",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "dataxweb",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "dataxweb"
  }, {
  	"name": "MAIL_USERNAME",
  	"label": "告警邮箱账号用户名",
  	"description": "告警邮箱账号用户名",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": ""
  }, {
  	"name": "MAIL_PASSWORD",
  	"label": "告警邮箱密码",
  	"description": "告警邮箱密码",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": ""
  }, {
  	"name": "PYTHON_PATH",
  	"label": "DATAX的Python解释器路径",
  	"description": "DATAX的Python解释器路径",
  	"configType": "map",
  	"required": true,
  	"type": "input",
  	"value": "/opt/datasophon/datax/bin/datax.py",
  	"configurableInWizard": true,
  	"hidden": false,
  	"defaultValue": "/opt/datasophon/datax/bin/datax.py"
  }]
}

各节点新增ftl脚本文件

注意所有节点都需要操作

bootstrap-properties.ftl

vi /opt/datasophon/datasophon-worker/conf/templates/bootstrap-properties.ftl

#Database
DB_HOST=${DB_HOST}
DB_PORT=${DB_PORT}
DB_USERNAME=${DB_USERNAME}
DB_PASSWORD=${DB_PASSWORD}
DB_DATABASE=${DB_DATABASE}

datax-admin-env-properties.ftl

vi /opt/datasophon/datasophon-worker/conf/templates/datax-admin-env-properties.ftl

# environment variables

JAVA_HOME="/usr/local/jdk1.8.0_333"

WEB_LOG_PATH=/opt/datasophon/dataxweb/modules/datax-admin/logs
WEB_CONF_PATH=/opt/datasophon/dataxweb/modules/datax-admin/conf

DATA_PATH=/opt/datasophon/dataxweb/modules/datax-admin/data
SERVER_PORT=9527


# mail account
MAIL_USERNAME="${MAIL_USERNAME}"
MAIL_PASSWORD="${MAIL_PASSWORD}"


#debug
#REMOTE_DEBUG_SWITCH=true
#REMOTE_DEBUG_PORT=7003

datax-executor-env-properties.ftl

vi /opt/datasophon/datasophon-worker/conf/templates/datax-executor-env-properties.ftl

# environment variables

JAVA_HOME="/usr/local/jdk1.8.0_333"

SERVICE_LOG_PATH=/opt/datasophon/dataxweb/modules/datax-executor/logs
SERVICE_CONF_PATH=/opt/datasophon/dataxweb/modules/datax-executor/conf
DATA_PATH=/opt/datasophon/dataxweb/modules/datax-executor/data


## datax json文件存放位置
JSON_PATH=/opt/datasophon/dataxweb/modules/datax-executor/json


## executor_port
EXECUTOR_PORT=9999


## 保持和datax-admin端口
DATAX_ADMIN_PORT=

## PYTHON脚本执行位置
PYTHON_PATH=${PYTHON_PATH}



## dataxweb 服务端口
SERVER_PORT=9504


#debug 远程调试端口
#REMOTE_DEBUG_SWITCH=true
#REMOTE_DEBUG_PORT=7004

重启

各节点worker重启

sh /opt/datasophon/datasophon-worker/bin/datasophon-worker.sh restart worker

主节点重启api

sh /opt/datasophon/datasophon-manager-1.2.1/bin/datasophon-api.sh restart api

手动创建数据库并且运行sql

执行/opt/datasophon/DDP/packages目录下datax_web.sql创建datax_web数据库表。

CREATE DATABASE dataxweb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
use dataxweb;
source /opt/datasophon/DDP/packages/datax_web.sql;

安装DATAXWEB

image-20241203224943918

添加DATAXWEB服务

image-20241203225118087

分配Admin角色。根据实际选择安装在哪个节点机器,这里只需要选择其中一台部署即可

image-20241203225314856

分配Executor角色。根据实际选择安装在哪个节点机器,这里可以选择部署一个或者多个

image-20241203225333862

根据实际情况修改相关配置。

image-20241203225459915

image-20241203225530952

添加DATAXWEB服务成功

posted @ 2024-12-10 22:55  xiongshengxiao  阅读(123)  评论(0)    收藏  举报