数据治理开始前,先掌握这个元数据治理神器-Atlas

一、Atlas是什么?

在当今大数据的应用越来越广泛的情况下,数据治理一直是企业面临的巨大问题。

大部分公司只是单纯地对数据进行了处理,而数据的血缘,分类等等却很难实现,市场上也急需要一个专注于数据治理的技术框架,这时Atlas应运而生。

Atlas官网地址:https://atlas.apache.org/

Atlas是Hadoop的数据治理和元数据框架。

Atlas是一组可扩展和可扩展的核心基础治理服务,使企业能够有效,高效地满足Hadoop中的合规性要求,并允许与整个企业数据生态系统集成。

Apache Atlas为组织提供了开放的元数据管理和治理功能,以建立其数据资产的目录,对这些资产进行分类和治理,并为数据科学家,分析师和数据治理团队提供围绕这些数据资产的协作功能。

  • Atlas支持各种Hadoop和非Hadoop元数据类型

  • 提供了丰富的REST API进行集成

  • 对数据血缘的追溯达到了字段级别,这种技术还没有其实类似框架可以实现

  • 对权限也有很好的控制

二、架构原理

 

Atlas包括以下组件:

  • 采用Hbase存储元数据

  • 采用Solr实现索引

  • Ingest/Export 采集导出组件 Type System类型系统 Graph Engine图形引擎 共同构成Atlas的核心机制

  • 所有功能通过API向用户提供,也可以通过Kafka消息系统进行集成

  • Atlas支持各种源获取元数据:Hive,Sqoop,Storm。。。

  • 还有优秀的UI支持

三、效果图

 

 

四、生成血缘数据

血缘关系数据通过Process生成,可以在数据导入时自动生成或通过RestAPI新增Process生成。

1、sqoop同步自动生成血缘数据

sqoop同步MySQL数据库数据到hive,同步成功后,通过sqoop的Atlas Hook自动生成血缘数据。

sqoop将MySQL数据库所有表数据同步到hive仓库命令:

sqoop import-all-tables  --connect jdbc:mysql://192.168.1.1:3306/testdb --username root --password ****** --hive-import --hive-database testdb  --m 1

Atlas管理台可以查看到每张表的血缘关系图:

​​

2、RestAPI接口生成血缘数据

通过Atlas的RestAPI接口新增Process,可以生成血缘数据。

例如将Atlas元数据管理的MySQL数据库表和hive数据表关联生成血缘数据,先查到两张表的guid值,然后构造请求数据调用接口:http://{atlas_host}:21000/api/atlas/v2/entity/bulk

请求消息:

{"entities":[{"typeName":"Process","attributes":{"owner":"root","createTime":"2020-05-07T10:32:21.0Z","updateTime":"","qualifiedName":"people@process@mysql://192.168.1.1:3306","name":"peopleProcess","description":"people Process","comment":"test people Process","contact_info":"jdbc","type":"table","inputs":[{"guid": "5a676b74-e058-4e81-bcf8-42d73f4c1729","typeName": "rdbms_table"}],"outputs":[{"guid": "2e7c70e1-5a8a-4430-859f-c46d267e33fd","typeName": "hive_table"}]}}]}

Atlas管理台可以查看到表的血缘关系图:

 

3、hive建表语句自动生成血缘数据

hive执行hive SQL语句create table t2 as select id, name from T1创建表,会自动生成表的血缘数据以及字段级的血缘数据。

Hive 2.2.0以下的低版本存在bug,字段级的血缘数据不能自动生成,需升级hive版本到2.2.0及以上才能正常生成字段级的血缘数据。

Atlas管理台可以查看到表的血缘关系图:

 

字段(列)级血缘图:

 

4、多个Process联结的血缘图

 

五、管理血缘数据

1、Rest API查询血缘数据

get请求:http://{atlas_host}:21000/api/atlas/v2/lineage/01d12e5f-1ef5-46a8-ac13-29be71e8f78e

响应消息:

{"baseEntityGuid":"01d12e5f-1ef5-46a8-ac13-29be71e8f78e","lineageDirection":"BOTH","lineageDepth":3,"guidEntityMap":{"5a676b74-e058-4e81-bcf8-42d73f4c1729":{"typeName":"rdbms_table","attributes":{"owner":"root","createTime":1577687198000,"qualifiedName":"testdb.p_people@mysql://192.168.1.1:3306","name":"p_people","description":"MySQL数据库表:testdb.p_people"},"guid":"5a676b74-e058-4e81-bcf8-42d73f4c1729","status":"ACTIVE","displayText":"p_people","classificationNames":[],"meaningNames":[],"meanings":[]},"2e7c70e1-5a8a-4430-859f-c46d267e33fd":{"typeName":"hive_table","attributes":{"owner":"hdfs","createTime":1578981817000,"qualifiedName":"testdb.p_people@primary","name":"p_people"},"guid":"2e7c70e1-5a8a-4430-859f-c46d267e33fd","status":"ACTIVE","displayText":"p_people","classificationNames":["people"],"meaningNames":[],"meanings":[]},"2b65eb7f-596e-48f0-a94d-240e56a4da93":{"typeName":"Process","attributes":{"owner":"root","qualifiedName":"people@process@mysql://192.168.1.1:3306","name":"peopleProcess","description":"people Process"},"guid":"2b65eb7f-596e-48f0-a94d-240e56a4da93","status":"ACTIVE","displayText":"peopleProcess","classificationNames":[],"meaningNames":[],"meanings":[]},"01d12e5f-1ef5-46a8-ac13-29be71e8f78e":{"typeName":"hive_process","attributes":{"qualifiedName":"testdb.p_people_tmp2@primary:1588921268000","name":"create table p_people_tmp2 as select peopleid,peopletype,credentialtype,credentialno,peoplename,gender,nation from p_people"},"guid":"01d12e5f-1ef5-46a8-ac13-29be71e8f78e","status":"ACTIVE","displayText":"create table p_people_tmp2 as select peopleid,peopletype,credentialtype,credentialno,peoplename,gender,nation from p_people","classificationNames":["people"],"meaningNames":[],"meanings":[]},"a4ccceb2-a52c-46a2-b4fd-27d26b8aad3f":{"typeName":"hive_table","attributes":{"owner":"hive","createTime":1588921268000,"qualifiedName":"testdb.p_people_tmp2@primary","name":"p_people_tmp2"},"guid":"a4ccceb2-a52c-46a2-b4fd-27d26b8aad3f","status":"ACTIVE","displayText":"p_people_tmp2","classificationNames":["people"],"meaningNames":[],"meanings":[]}},"relations":[{"fromEntityId":"01d12e5f-1ef5-46a8-ac13-29be71e8f78e","toEntityId":"a4ccceb2-a52c-46a2-b4fd-27d26b8aad3f","relationshipId":"148cc83d-5b67-4174-91e4-767509483e13"},{"fromEntityId":"2e7c70e1-5a8a-4430-859f-c46d267e33fd","toEntityId":"01d12e5f-1ef5-46a8-ac13-29be71e8f78e","relationshipId":"eb768346-d32a-40f9-bf04-d23abbcc3221"},{"fromEntityId":"2b65eb7f-596e-48f0-a94d-240e56a4da93","toEntityId":"2e7c70e1-5a8a-4430-859f-c46d267e33fd","relationshipId":"bea47efd-2645-4d8a-ba6b-8f4ef9bb7316"},{"fromEntityId":"5a676b74-e058-4e81-bcf8-42d73f4c1729","toEntityId":"2b65eb7f-596e-48f0-a94d-240e56a4da93","relationshipId":"517db5b7-f537-4e66-97f1-33c2863fb440"}]}

2、管理界面查看血缘图

 

可以在Atlas管理台每个实体详情的Lineage选项卡页面查看血缘图:

posted @ 2021-03-30 23:34  疯码牛Pro  阅读(673)  评论(0编辑  收藏  举报