元数据比对-altas vs amundsen vs TDH-catalog(一)


一、 Altas
属于apache开源的元数据管理系统,可以对接hive、storm、kafka、hbase、sqoop等组件完成元数据管理以及数据的血缘关系。

系统架构图:

clip_image002

MetaSource Sources:目前,Atlas支持从以下来源提取和管理元数据:Hbase、Hive、Sqoop、Storm、Kafka

Messaging:除了API之外,用户还可以选择使用基于Kafka的消息传递接口与Atlas集成

采集/导出(Ingest/Export):采集组件允许将元数据添加到Atlas。同样,“导出”组件将Atlas检测到的元数据更改公开为事件。

类型系统(Type System):用户为他们想要管理的元数据对象定义模型。Type System称为“实体”的“类型”实例,表示受管理的实际元数据对象。

图形引擎(Graph Engine):Atlas 通过使用图形模型管理元数据对象。

Titan:目前,Atlas 使用 Titan 图数据库来存储元数据对象

Metadata Store<Hbase>:采用Hbase来存储元数据

IndexStore<Solr>:采用Solr来建索引

API:Atlas的所有功能都可以通过REST API提供给最终用户,允许创建、更新和删除类型和实体。它也是查询和发现通过Atlas管理的类型和实体的主要方法。

Atlas Admin UI:该组件是一个基于Web的应用程序,允许数据管理员和科学家发现和注释元数据。Admin UI提供了搜索界面和类SQL的查询语言,可以用来查询由Atlas管理的元数据类型和对象。

Tag Based Policies:权限管理模块。

Business Taxonomy:业务分

l github地址

https://github.com/apache/atlas

l 安装文档帮助

https://atlas.apache.org/#/Installation

l 配置连接数据源

Metadata sources

Atlas supports integration with many sources of metadata out of the box. More integrations will be added in future as well. Currently, Atlas supports ingesting and managing metadata from the following sources:

功能详情:

1. 根据表名称搜索;

2. 给表或者文件加标签或者分类

3. 根据分类或者标签搜索

clip_image004

代码研究

clip_image006

通过代码包的名称很容易理解各个包的作用,我们主要说明下altas如何解析sql找到关系。

Altas用到了antlr4解析sql,具体antlr4解析sql的用法大家可以自行搜索。下面给出关键的语法树解析规则。

clip_image008

文件AtlasDSLParser.g4

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

*

* http://www.apache.org/licenses/LICENSE-2.0

*

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

parser grammar AtlasDSLParser;

options { tokenVocab=AtlasDSLLexer; }

// Core rules

identifier: ID ;

operator: (K_LT | K_LTE | K_EQ | K_NEQ | K_GT | K_GTE | K_LIKE) ;

sortOrder: K_ASC | K_DESC ;

valueArray: K_LBRACKET ID (K_COMMA ID)* K_RBRACKET ;

literal: BOOL | NUMBER | FLOATING_NUMBER | (ID | valueArray) ;

// Composite rules

limitClause: K_LIMIT NUMBER ;

offsetClause: K_OFFSET NUMBER ;

atomE: (identifier | literal) | K_LPAREN expr K_RPAREN ;

multiERight: (K_STAR | K_DIV) atomE ;

multiE: atomE multiERight* ;

arithERight: (K_PLUS | K_MINUS) multiE ;

arithE: multiE arithERight* ;

comparisonClause: arithE operator arithE ;

isClause: arithE (K_ISA | K_IS) identifier ;

hasClause: arithE K_HAS identifier ;

countClause: K_COUNT K_LPAREN K_RPAREN ;

maxClause: K_MAX K_LPAREN expr K_RPAREN ;

minClause: K_MIN K_LPAREN expr K_RPAREN ;

sumClause: K_SUM K_LPAREN expr K_RPAREN ;

exprRight: (K_AND | K_OR) compE ;

compE: comparisonClause

| isClause

| hasClause

| arithE

| countClause

| maxClause

| minClause

| sumClause

;

expr: compE exprRight* ;

limitOffset: limitClause offsetClause? ;

selectExpression: expr (K_AS identifier)? ;

selectExpr: selectExpression (K_COMMA selectExpression)* ;

aliasExpr: (identifier | literal) K_AS identifier ;

orderByExpr: K_ORDERBY expr sortOrder? ;

fromSrc: aliasExpr | (identifier | literal) ;

whereClause: K_WHERE expr ;

fromExpression: fromSrc whereClause? ;

fromClause: K_FROM fromExpression ;

selectClause: K_SELECT selectExpr ;

singleQrySrc: fromClause | whereClause | fromExpression | expr ;

groupByExpression: K_GROUPBY K_LPAREN selectExpr K_RPAREN ;

commaDelimitedQueries: singleQrySrc (K_COMMA singleQrySrc)* ;

spaceDelimitedQueries: singleQrySrc singleQrySrc* ;

querySrc: commaDelimitedQueries | spaceDelimitedQueries ;

query: querySrc groupByExpression?

selectClause?

orderByExpr?

limitOffset? EOF;

文件AtlasDSLLexer.g4

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

*

* http://www.apache.org/licenses/LICENSE-2.0

*

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

*/

lexer grammar AtlasDSLLexer;

fragment A: ('A'|'a');

fragment B: ('B'|'b');

fragment C: ('C'|'c');

fragment D: ('D'|'d');

fragment E: ('E'|'e');

fragment F: ('F'|'f');

fragment G: ('G'|'g');

fragment H: ('H'|'h');

fragment I: ('I'|'i');

fragment J: ('J'|'j');

fragment K: ('K'|'k');

fragment L: ('L'|'l');

fragment M: ('M'|'m');

fragment N: ('N'|'n');

fragment O: ('O'|'o');

fragment P: ('P'|'p');

fragment Q: ('Q'|'q');

fragment R: ('R'|'r');

fragment S: ('S'|'s');

fragment T: ('T'|'t');

fragment U: ('U'|'u');

fragment V: ('V'|'v');

fragment W: ('W'|'w');

fragment X: ('X'|'x');

fragment Y: ('Y'|'y');

fragment Z: ('Z'|'z');

fragment DIGIT: [0-9];

fragment LETTER: 'a'..'z'| 'A'..'Z' | '_';

// Comment skipping

SINGLE_LINE_COMMENT: '--' ~[\r\n]* -> channel(HIDDEN) ;

MULTILINE_COMMENT : '/*' .*? ( '*/' | EOF ) -> channel(HIDDEN) ;

WS: (' ' ' '* | [ \n\t\r]+) -> channel(HIDDEN) ;

// Lexer rules

NUMBER: (K_PLUS | K_MINUS)? DIGIT DIGIT* (E (K_PLUS | K_MINUS)? DIGIT DIGIT*)? ;

FLOATING_NUMBER: (K_PLUS | K_MINUS)? DIGIT+ K_DOT DIGIT+ (E (K_PLUS | K_MINUS)? DIGIT DIGIT*)? ;

BOOL: K_TRUE | K_FALSE ;

K_COMMA: ',' ;

K_PLUS: '+' ;

K_MINUS: '-' ;

K_STAR: '*' ;

K_DIV: '/' ;

K_DOT: '.' ;

K_LIKE: L I K E ;

K_AND: A N D ;

K_OR: O R ;

K_LPAREN: '(' ;

K_LBRACKET: '[' ;

K_RPAREN: ')' ;

K_RBRACKET: ']' ;

K_LT: '<' | L T ;

K_LTE: '<=' | L T E ;

K_EQ: '=' | E Q ;

K_NEQ: '!=' | N E Q ;

K_GT: '>' | G T ;

K_GTE: '>=' | G T E ;

K_FROM: F R O M ;

K_WHERE: W H E R E ;

K_ORDERBY: O R D E R B Y ;

K_GROUPBY: G R O U P B Y ;

K_LIMIT: L I M I T ;

K_SELECT: S E L E C T ;

K_MAX: M A X ;

K_MIN: M I N ;

K_SUM: S U M ;

K_COUNT: C O U N T ;

K_OFFSET: O F F S E T ;

K_AS: A S ;

K_ISA: I S A ;

K_IS: I S ;

K_HAS: H A S ;

K_ASC: A S C ;

K_DESC: D E S C ;

K_TRUE: T R U E ;

K_FALSE: F A L S E ;

KEYWORD: K_LIKE

| K_DOT

| K_SELECT

| K_AS

| K_HAS

| K_IS

| K_ISA

| K_WHERE

| K_LIMIT

| K_TRUE

| K_FALSE

| K_AND

| K_OR

| K_GROUPBY

| K_ORDERBY

| K_SUM

| K_MIN

| K_MAX

| K_OFFSET

| K_FROM

| K_DESC

| K_ASC

| K_COUNT

;

ID: STRING

|LETTER (LETTER|DIGIT)*

| LETTER (LETTER|DIGIT)* KEYWORD KEYWORD*

| KEYWORD KEYWORD* LETTER (LETTER|DIGIT)*

| LETTER (LETTER|DIGIT)* KEYWORD KEYWORD* LETTER (LETTER|DIGIT)*

;

STRING: '"' ~('"')* '"' | '\'' ~('\'')* '\'' | '`' ~('`')* '`';

posted @ 2021-08-30 09:19  Tim&Blog  阅读(957)  评论(0编辑  收藏  举报