Apache Atlas 安装,配置 HiveHook

下载源码

Apache Atlas 官网只提供源码包下载: Download

直接下载最新版本: 2.2.0

venn@venn git % wget https://downloads.apache.org/atlas/2.2.0/apache-atlas-2.2.0-sources.tar.gz  # 下载 
venn@venn git % tar -zxvf apache-atlas-2.2.0-sources.tar.gz # 解压
venn@venn git % ls
apache-atlas-2.2.0-sources.tar.gz
apache-atlas-sources-2.2.0                         

编译

编译参考官网: 构建和安装 Apache Atlas

基于 Atlas 对各种组件的依赖,编译提供了很多嵌入式的组件,比如: Hbase、Solr 等

直接编译 “使用嵌入式 Apache HBase 和 Apache Solr 打包 Apache Atlas”

mvn clean -DskipTests package -Pdist,embedded-hbase-solr

....

[INFO] Reactor Summary for apache-atlas 2.2.0:
[INFO] 
[INFO] Apache Atlas Server Build Tools .................... SUCCESS [  1.190 s]
[INFO] apache-atlas ....................................... SUCCESS [  4.594 s]
[INFO] Apache Atlas Integration ........................... SUCCESS [ 12.596 s]
[INFO] Apache Atlas Test Utility Tools .................... SUCCESS [  3.830 s]
[INFO] Apache Atlas Common ................................ SUCCESS [  2.827 s]
[INFO] Apache Atlas Client ................................ SUCCESS [  0.331 s]
[INFO] atlas-client-common ................................ SUCCESS [  1.281 s]
[INFO] atlas-client-v1 .................................... SUCCESS [  1.968 s]
[INFO] Apache Atlas Server API ............................ SUCCESS [  1.484 s]
[INFO] Apache Atlas Notification .......................... SUCCESS [  3.790 s]
[INFO] atlas-client-v2 .................................... SUCCESS [  0.969 s]
[INFO] Apache Atlas Graph Database Projects ............... SUCCESS [  0.198 s]
[INFO] Apache Atlas Graph Database API .................... SUCCESS [  1.180 s]
[INFO] Graph Database Common Code ......................... SUCCESS [  1.120 s]
[INFO] Apache Atlas JanusGraph-HBase2 Module .............. SUCCESS [  0.988 s]
[INFO] Apache Atlas JanusGraph DB Impl .................... SUCCESS [  5.341 s]
[INFO] Apache Atlas Graph DB Dependencies ................. SUCCESS [  1.406 s]
[INFO] Apache Atlas Authorization ......................... SUCCESS [  1.509 s]
[INFO] Apache Atlas Repository ............................ SUCCESS [  9.290 s]
[INFO] Apache Atlas UI .................................... SUCCESS [ 22.949 s]
[INFO] Apache Atlas New UI ................................ SUCCESS [ 22.439 s]
[INFO] Apache Atlas Web Application ....................... SUCCESS [ 51.105 s]
[INFO] Apache Atlas Documentation ......................... SUCCESS [  0.996 s]
[INFO] Apache Atlas FileSystem Model ...................... SUCCESS [  1.768 s]
[INFO] Apache Atlas Plugin Classloader .................... SUCCESS [  0.791 s]
[INFO] Apache Atlas Hive Bridge Shim ...................... SUCCESS [  1.902 s]
[INFO] Apache Atlas Hive Bridge ........................... SUCCESS [  4.811 s]
[INFO] Apache Atlas Falcon Bridge Shim .................... SUCCESS [ 27.805 s]
[INFO] Apache Atlas Falcon Bridge ......................... SUCCESS [  3.164 s]
[INFO] Apache Atlas Sqoop Bridge Shim ..................... SUCCESS [  3.344 s]
[INFO] Apache Atlas Sqoop Bridge .......................... SUCCESS [  8.621 s]
[INFO] Apache Atlas Storm Bridge Shim ..................... SUCCESS [ 48.489 s]
[INFO] Apache Atlas Storm Bridge .......................... SUCCESS [  4.718 s]
[INFO] Apache Atlas Hbase Bridge Shim ..................... SUCCESS [  2.068 s]
[INFO] Apache Atlas Hbase Bridge .......................... SUCCESS [01:13 min]
[INFO] Apache HBase - Testing Util ........................ SUCCESS [  3.748 s]
[INFO] Apache Atlas Kafka Bridge .......................... SUCCESS [ 28.061 s]
[INFO] Apache Atlas classification updater ................ SUCCESS [  0.906 s]
[INFO] Apache Atlas index repair tool ..................... SUCCESS [  3.032 s]
[INFO] Apache Atlas Impala Hook API ....................... SUCCESS [  0.309 s]
[INFO] Apache Atlas Impala Bridge Shim .................... SUCCESS [  0.348 s]
[INFO] Apache Atlas Impala Bridge ......................... SUCCESS [  3.057 s]
[INFO] Apache Atlas Distribution .......................... SUCCESS [15:53 min]
[INFO] atlas-examples ..................................... SUCCESS [  0.386 s]
[INFO] sample-app ......................................... SUCCESS [  3.217 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  22:12 min
[INFO] Finished at: 2022-03-21T15:24:37+08:00
[INFO] ------------------------------------------------------------------------

  • 使用 Embedded-hbase-solr 配置文件将配置 Apache Atlas,以便 Apache HBase 实例和 Apache Solr 实例将与 Apache Atlas 服务器一起启动和停止。

  • 注意:此分发配置文件仅用于单节点开发而非生产。

编译完成后包的路径: apache-atlas-sources-2.2.0/distro/target,将生成好的安装包 apache-atlas-2.1.0-server.tar.gz 拷贝到 /opt 下,解压

venn@venn target % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target
venn@venn target % ls
META-INF                                      apache-atlas-2.2.0-kafka-hook.tar.gz          hbase
antrun                                        apache-atlas-2.2.0-server.tar.gz              hbase.temp
apache-atlas-2.2.0-atlas-index-repair.zip     apache-atlas-2.2.0-sources.tar.gz             maven-archiver
apache-atlas-2.2.0-bin.tar.gz                 apache-atlas-2.2.0-sqoop-hook.tar.gz          maven-shared-archive-resources
apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz          rat.txt
apache-atlas-2.2.0-falcon-hook.tar.gz         archive-tmp                                   solr
apache-atlas-2.2.0-hbase-hook.tar.gz          atlas-distro-2.2.0.jar                        solr.temp
apache-atlas-2.2.0-hive-hook.tar.gz           bin                                           test-classes
apache-atlas-2.2.0-impala-hook.tar.gz         conf

venn@venn /opt % ls
apache-atlas-2.2.0              
apache-atlas-2.2.0-server.tar.gz

修改配置

进入conf目录下:

vi  atlas-env.sh 

指定 JAVA_HOME (默认启动内嵌 hbase/solr )

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true 

启动 atlas

venn@venn atlas-2.22 % bin/atlas_start.py
venn@venn atlas-2.22 % bin/atlas_stop.py 
No process ID file found. Server not running?
venn@venn atlas-2.22 % bin/atlas_start.py 

Configured for local HBase.
Starting local HBase...
Local HBase started!

Configured for local Solr.
Starting local Solr...
Local Solr started!

Creating Solr collections for Atlas using config: /opt/atlas-2.22/conf/solr

Starting Atlas server on host: localhost
Starting Atlas server on port: 21000

Apache Atlas Server started!!!

启动成功后,打开 web 界面:

  • 用户名、密码: admin/admin

配置 hive hook

官网 HookHive

hive 版本: 3.1.2

  • 注:本来版本是 2.3.3,一直报包冲突,想编译一个 hive 版本是 2.3.3 的 atlas,失败了,就又安装了一个 3.1.2 版本的 hive
  1. 在 hive-site.xml 中添加如下参数,设置 Atlas hook:
<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
  1. 解压 apache-atlas-2.2.0-hive-hook.tar.gz
venn@venn target % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target
venn@venn target % ls
META-INF                                      apache-atlas-2.2.0-kafka-hook.tar.gz          conf
antrun                                        apache-atlas-2.2.0-server.tar.gz              hbase
apache-atlas-2.2.0-atlas-index-repair.zip     apache-atlas-2.2.0-sources.tar.gz             hbase.temp
apache-atlas-2.2.0-bin.tar.gz                 apache-atlas-2.2.0-sqoop-hook.tar.gz          maven-archiver
apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz          maven-shared-archive-resources
apache-atlas-2.2.0-falcon-hook.tar.gz         apache-atlas-hive-hook-2.2.0                  rat.txt
apache-atlas-2.2.0-hbase-hook.tar.gz          archive-tmp                                   solr
apache-atlas-2.2.0-hive-hook.tar.gz           atlas-distro-2.2.0.jar                        solr.temp
apache-atlas-2.2.0-impala-hook.tar.gz         bin                                           test-classes

  1. 复制 apache-atlas-hive-hook-2.2.0/hook/hive to atlas 安装目录: /opt/atlas-2.2.0/hook/hive

  2. hive-env.sh HIVE_AUX_JARS_PATH 添加 atlas hive hook

export HIVE_AUX_JARS_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home,/opt/atlas-2.22/hook/hive

  1. 复制 /opt/atlas-2.2.0/conf/atlas-application.properties 到 hive conf 目录

初始化 hive 元数据到 atlas

复制 import-hive.sh 到 atlas bin 目录

venn@venn hook-bin % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target/apache-atlas-hive-hook-2.2.0/hook-bin
venn@venn hook-bin % ls
import-hive.sh
venn@venn hook-bin % cp import-hive.sh /opt/atlas-2.22/bin 
venn@venn hook-bin % ls /opt/atlas-2.22/bin
atlas_admin.py                   atlas_config.pyc                 atlas_start.py                   cputil.py                        quick_start_v1.py
atlas_client_cmdline.py          atlas_kafka_setup.py             atlas_stop.py                    import-hive.sh
atlas_config.py                  atlas_kafka_setup_hook.py        atlas_update_simple_auth_json.py quick_start.py
venn@venn hook-bin % 

venn@venn atlas-2.22 % sh bin/import-hive.sh                                                                                                           
Using Hive configuration directory [/opt/hive-3.1.2/conf]
Log file for import is /var/log/atlas/import-hive.log

...

2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :- 
2022-03-23T09:48:34,444 INFO [main] org.apache.atlas.AtlasBaseClient - Client has only one service URL, will use that for all actions: http://localhost:21000
2022-03-23T09:48:34,483 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/opt/hive-3.1.2/conf/hive-site.xml
2

...

2022-03-23T09:48:44,204 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_db entity: name=default@primary, guid=1ceabd72-505f-4338-b9eb-c4e1511fd882
2022-03-23T09:48:44,247 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - No tables to import in database default
Hive Meta Data imported successfully!!!

导入成功,查看 atlas 管理页面:

动态加载 hive 元数据、血缘

create table as


hive> use atlas1;
OK
Time taken: 0.634 seconds
hive> show tables;
OK
tab_name
t_a
Time taken: 0.255 seconds, Fetched: 1 row(s)
hive> create table t_b as select * from t_a;
Query ID = venn_20220324151022_2b39072e-5137-4544-b77f-a616b5713314
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:10:25,785 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:10:26,049 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0001, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0001/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job  -kill job_1648105770329_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:10:38,140 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1648105770329_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/.hive-staging_hive_2022-03-24_15-10-22_500_6973722723439996636-1/-ext-10002
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_b
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 20.185 seconds

元数据:

表血缘:

字段血缘:

insert into

hive> create table t_c(aa string);
OK
Time taken: 0.45 seconds
hive> insert into t_c select aa from t_b;
Query ID = venn_20220324151139_074245e5-efe8-4ae7-9893-7ec07d06f242
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:11:39,848 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:11:39,874 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0002, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0002/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job  -kill job_1648105770329_0002
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:11:48,811 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1648105770329_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_c/.hive-staging_hive_2022-03-24_15-11-39_101_3895814859882010053-1/-ext-10000
Loading data to table atlas1.t_c
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 13.734 seconds

表血缘:

字段血缘:

欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

posted on 2022-03-24 15:41  Flink菜鸟  阅读(1788)  评论(0编辑  收藏  举报