ETL常规状态类表拉链算法详解
拉链算法是数仓常用的数据存储技术
建SDBDDL、SDB、ODB库,分别用于存储表结构、存储增量数据、存储历史数据:
CREATE DATABASE "SDBDDL" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL CREATE DATABASE "SDB" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL CREATE DATABASE "ODB" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL
SDBDDL库建表:
CREATE MULTISET TABLE SDBDDL.SW5_SALARY_INFO( ID VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE '账户ID', NAME VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE '户名', BALANCE DECIMAL(12,2) TITLE '余额') PRIMARY INDEX ( ID );
注意:
1.定义为SET表的表不存储重复记录,而MULTISET表可存储重复记录
2.CASESPECIFIC ——大小写敏感,NOT CASESPECIFIC ——大小写不敏感
ODB库建表:(状态类无删除表)
CREATE MULTISET TABLE ODB.SW5_SALARY_INFO( ID VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE '账户ID' NOT NULL, NAME VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE '户名' NOT NULL, BALANCE DECIMAL(12,2) TITLE '余额' NOT NULL, START_DT DATE FORMAT 'YYYYMMDD' TITLE '开始时间' NOT NULL, END_DT DATE FORMAT 'YYYYMMDD' TITLE '结束时间' NOT NULL ) PRIMARY INDEX ( ID );
给ODB铺个底:

INSERT INTO ODB.SW5_SALARY_INFO VALUES('001','张三','520',DATE'2020-01-02',DATE'2999-12-31'); INSERT INTO ODB.SW5_SALARY_INFO VALUES('002','李四','2100',DATE'2020-10-10',DATE'2999-12-31');

ODB有两条铺底数据了
20201016的下发增量数据:

张三的余额发生变化,变成1314
新增了王五的数据
入库 ——SDB.SW5_SALARY_INFO1016
INSERT INTO SDB.SW5_SALARY_INFO1016 VALUES('001','张三','1314'); INSERT INTO SDB.SW5_SALARY_INFO1016 VALUES('003','王五','5000');

将20201016的数据入导ODB:
建一张临时表VT_NEW(ODB表结构):
CREATE MULTISET VOLATILE TABLE VT_NEW, NO LOG AS ODB.SW5_SALARY_INFO WITH NO DATA ON COMMIT PRESERVE ROWS;
取出新增和修改的数据,导入临时表:
INSERT INTO VT_NEW( ID ,NAME ,BALANCE ,START_DT ,END_DT) SELECT N.ID ,N.NAME ,N.BALANCE ,DATE'2020-10-16' ,DATE'2999-12-31' FROM (SELECT COALESCE( TRIM( ID ) , '' ) AS ID ,COALESCE( TRIM( NAME ) , '' ) AS NAME ,COALESCE( TRIM( BALANCE ) , 0 ) AS BALANCE FROM SDB.SW5_SALARY_INFO1016 ) N LEFT JOIN (SELECT ID ,NAME ,BALANCE FROM ODB.SW5_SALARY_INFO WHERE END_DT = DATE'2999-12-31' ) T ON N.ID = T.ID WHERE T.ID IS NULL OR N.NAME<>T.NAME OR N.BALANCE<>T.BALANCE ;
查临时表:
SELECT * FROM VT_NEW

张三是修改的数据
王五是新增的数据
修改目标表
UPDATE T FROM ODB.SW5_SALARY_INFO T, VT_NEW N SET END_DT = DATE'2020-10-16' WHERE T.END_DT = DATE'2999-12-31' AND N.ID = T.ID ;
![]()
INSERT INTO ODB.SW5_SALARY_INFO SELECT * FROM VT_NEW;

到这一步,就完成了20201016的增量数据入库到近源模型层ODB


浙公网安备 33010602011771号