安装HUE并配置连接Hive&Flink&TIDB

安装依赖包
yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
python-devel openssl-devel libsqlite3-dev sqlite-devel libffi-dev libffi-devel gcc-c++ cyrus-sasl cyrus-sasl-devel cyrus-sasl-lib mysql-community-client mariadb-devel openldap-devel libxslt-devel rsync
安装jdk
...
安装python环境及pip
wget https://files.pythonhosted.org/packages/5a/df/b2e3d9693bb0dcbeac516a73dd7a9eb82b126ae52e4a74605a9b01beddd5/setuptools-40.1.0.zip unzip setuptools-40.1.0.zip cd setuptools-40.1.0 python setup.py install wget https://files.pythonhosted.org/packages/69/81/52b68d0a4de760a2f1979b0931ba7889202f302072cc7a0d614211bc7579/pip-18.0.tar.gz tar zxvf pip-18.0.tar.gz cd pip-18.0 python setup.py install
安装nodejs
curl -sL https://rpm.nodesource.com/setup_10.x | bash - yum install -y nodejs
安装mysql
安装源 wget https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm rpm -ivh mysql57-community-release-el7-8.noarch.rpm 安装 yum -y install mysql-server 启动 systemctl start mysqld 获取MySQL临时密码 grep 'temporary password' /var/log/mysqld.log 登录 mysql -uroot -p 设置密码策略,修改密码 set global validate_password_policy=LOW; set global validate_password_length=5; ALTER USER 'root'@'localhost' IDENTIFIED BY 'admin'; 配置权限 mysql -uroot -padmin use mysql; select Host,User from user; update user set Host='%' where User='root'; flush privileges;
创建用户
groupadd hadoop
useradd -g hadoop hadoop
解压编译
wget https://cdn.gethue.com/downloads/hue-4.8.0.tgz PREFIX=/usr/share make install
集成Hive Conf
在/opt/hive/conf目录添加hive-site.xml文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://172.31.6.20:9083</value>
</property>
</configuration>
Flink环境集成依赖的jar包
依赖的jar包
flink-connector-hive_2.11-1.11.3.jar flink-sql-connector-kafka_2.11-1.11.2.jar flink-parquet_2.11-1.11.2.jar libfb303-0.9.3.jar hive-exec-3.1.2.jar
重新编译hive-exec,解决guava版本冲突问题
下载hive源码,修改ql model下的pom文件,将guava版本由19.0改为20.0,重新编译
Flink环境集成Hadoop
进入Flink镜像工作目录(172.31.6.12)
cd /data/dockerfiles/flink-1.11
官网下载Hadoop安装包
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
解压安装包,删除无用文件
tar zxvf hadoop-3.1.3.tar.gz cd hadoop-3.1.3 rm -rf bin/ etc/ include/ lib/ libexec/ LICENSE.txt NOTICE.txt README.txt sbin/ rm -rf ./share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar rm -rf ./share/hadoop/hdfs/lib/netty-3.10.5.Final.jar
重新打包成 hadoop-3.1.3.tar.gz
tar zcvf hadoop-3.1.3.tar.gz hadoop-3.1.3
修改Dockerfile,添加如下内容
ADD hadoop-3.1.3.tar.gz /opt/flink/hadoop
打包上传镜像
docker build -t d-hub.jcinfo.com/library/flink:1.11.2-scala_2.11-java8-v1.4.3 . docker push d-hub.jcinfo.com/library/flink:1.11.2-scala_2.11-java8-v1.4.3
Rancher中配置Hadoop环境变量
HADOOP_CLASSPATH=/opt/flink/hadoop/hadoop-3.1.3/etc/hadoop:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/common/lib/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/common/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/hdfs:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/hdfs/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/mapreduce/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/yarn:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/yarn/lib/*:/opt/flink/hadoop/hadoop-3.1.3/share/hadoop/yarn/*
配置database
[[database]] engine=mysql host=192.168.56.131 port=3306 user=root password=li1383414 name=hue
初始化数据库
build/env/bin/hue syncdb
build/env/bin/hue migrate
连接Mysql&TIDB
[[mysql]]] name = MySQL interface=sqlalchemy options='{"url": "mysql://root@tidb.jcinfo.com:3306/test"}'
连接Hive
[[[hive]]] name=Hive interface=hiveserver2 [beeswax] hive_server_host=172.31.6.20 hive_server_port=10000 hive_conf_dir=/opt/hive/conf
连接Flink SQL
安装flink-sql-gateway
解压安装 wget https://github.com/ververica/flink-sql-gateway/releases/download/flink-1.11.1/flink-sql-gateway-0.2-SNAPSHOT-bin.zip unzip flink-sql-gateway-0.2-SNAPSHOT-bin.zip -d ../flink/
配置
server: bind-address: 0.0.0.0 # optional: The address that the gateway binds itself (127.0.0.1 by default) address: 0.0.0.0 # optional: The address that should be used by clients to connect to the gateway (127.0.0.1 by default) catalogs: #[] # empty list # A typical catalog definition looks like: - name: myhive type: hive hive-conf-dir: /opt/hive/conf/ default-database: dw
启动
./bin/sql-gateway.sh
注意:与flink安装在同一台机器
添加配置
[[[flink]]]
name=Flink
interface=flink
options='{"url": "http://127.0.0.1:8083"}'
启动HUE
build/env/bin/supervisor
使用示例
-- file CREATE TABLE `file_test_02`( `caseId` string, `sourceUrl` string ) with ( 'connector' = 'filesystem', 'path' = '/data/flink/file_test_02.txt', 'format' = 'json' ) INSERT INTO file_test_02 VALUES('a', 'b') select * from file_test_02 -- kafka CREATE TABLE `kafka_test_01`( `caseId` string, `sourceUrl` string ) with ( 'connector' = 'kafka', 'topic' = 'kafka_test_01', 'properties.group.id' = 'kafka_test_01', 'properties.bootstrap.servers' = '172.31.3.10:32058', 'format' = 'json', 'scan.startup.mode' = 'earliest-offset' ) insert into kafka_test_01 values('a', 'b') select * from kafka_test_01 -- HDFS drop table hdfs_pt_test CREATE TABLE hdfs_pt_test ( uniqid1 STRING, uniqid2 STRING, dt STRING ) PARTITIONED BY (dt) WITH ( 'connector'='filesystem', 'path'='hdfs://172.31.6.20:9000/jc/ods/hdfs_pt_test', 'format'='json', 'sink.partition-commit.delay'='1 h', 'sink.partition-commit.policy.kind'='success-file' ) insert into hdfs_pt_test values('a', 'b', 'c') select * from hdfs_pt_test where dt = 'c' -- catalogs show catalogs use catalog myhive show databases use dw show tables select * from test_03 select * from dwd_cpws_general select * from cpws_test_hive1
支持的功能
|
statement
|
comment
|
|---|---|
| SHOW CATALOGS | List all registered catalogs |
| SHOW DATABASES | List all databases in the current catalog |
| SHOW TABLES | List all tables and views in the current database of the current catalog |
| SHOW VIEWS | List all views in the current database of the current catalog |
| SHOW FUNCTIONS | List all functions |
| SHOW MODULES | List all modules |
| USE CATALOG catalog_name | Set a catalog with given name as the current catalog |
| USE database_name | Set a database with given name as the current database of the current catalog |
| CREATE TABLE table_name ... | Create a table with a DDL statement |
| DROP TABLE table_name | Drop a table with given name |
| ALTER TABLE table_name | Alter a table with given name |
| CREATE DATABASE database_name ... | Create a database in current catalog with given name |
| DROP DATABASE database_name ... | Drop a database with given name |
| ALTER DATABASE database_name ... | Alter a database with given name |
| CREATE VIEW view_name AS ... | Add a view in current session with SELECT statement |
| DROP VIEW view_name ... | Drop a table with given name |
| SET xx=yy | Set given key's session property to the specific value |
| SET | List all session's properties |
| RESET ALL | Reset all session's properties set by SET command |
| DESCRIBE table_name | Show the schema of a table |
| EXPLAIN PLAN FOR ... | Show string-based explanation about AST and execution plan of the given statement |
| SELECT ... | Submit a Flink SELECT SQL job |
| INSERT INTO ... | Submit a Flink INSERT INTO SQL job |
| INSERT OVERWRITE ... | Submit a Flink INSERT OVERWRITE SQL job |
参考资料
https://docs.gethue.com/administrator/installation/install/
https://docs.gethue.com/releases/release-notes-4.8.0/
https://github.com/ververica/flink-sql-gateway

浙公网安备 33010602011771号