准备将oracle数据抽取到hive中作为ods层的数据
方案:使用sqoop建立job增量抽取到HDFS,使用hive建立外部表指定location到HDFS下的数据位置,使用Azkaban定时调用

1.建立sqoop job

sqoop job --create TMGPFRETURNDETAIL_RPT -- import \
--connect jdbc:oracle:thin:@192.168.0.18:1521:xxxx \
--username xxxx\
--password-file /sqoop/pwd/sqoopPWD_mgrpt.pwd \
--target-dir /DATA/ODS/TMGPFRETURNDETAIL_RPT \
--fields-terminated-by '^A' \
--table TMGPFRETURNDETAIL_RPT \
--incremental append \
--check-column DEALTIME \
--last-value '2016-01-01 00:00:00' \
--m 6
append 表示数据已追加的方式抽取
last-value 建立job后下次运行时sqoop会根据已有值自行判断替换这个值
password-file 指定数据库的密码文件
target-dir  抽取的数据存放在hdfs的位置
fields-terminated-by 字段之间的分隔符
check-column 增量抽取时的增量字段

查看sqoop下的job

[hadoop@FineReportAppServer pwd]$ echo `date`  "jobs start" 
2021年 04月 06日 星期二 15:50:04 CST jobs start
[hadoop@FineReportAppServer pwd]$ sqoop job -list
...
...
Warning: /home/hadoop/sqoop-1.4.6-cdh5.7.0/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/04/06 16:04:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
Available jobs:
  TMGPFRETURNDETAIL_RPT
  TMGPFRETURNFORM_RPT
  TMGPFSALEDETAIL_RPT
  TMGPFSALEFORM_RPT

2.建立hive表

CREATE EXTERNAL TABLE `TMGPFRETURNDETAIL_RPT`(
  `branchno` string, 
  `pfreturnno` string, 
  `serialno` double, 
  `plucode` string, 
  `jprice` double, 
  `pfprice` double, 
  `price` double, 
  `counts` double, 
  `pftotal` double, 
  `yttotal` double, 
  `zpcount` double, 
  `sharetotal` double, 
  `zrtotal` double, 
  `rtnrate` double, 
  `rtntotal` double, 
  `cost` double, 
  `comflag` string, 
  `guidflag` string, 
  `dbname` string, 
  `dealguid` string, 
  `dealtime` string)
COMMENT 'Imported by sqoop on 2021/04/02 10:48:31'
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '^A' 
  LINES TERMINATED BY '\n' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://ns/DATA/ODS/TMGPFRETURNDETAIL_RPT'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'last_modified_by'='hadoop', 
  'last_modified_time'='1617349860', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1617349860')

建表的oracle

select 
'CREATE EXTERNAL TABLE '||table_name||' ( '||

wm_concat(column_name||' '|| decode(data_type,'NUMBER', 'double','string')) ||')'
||
'COMMENT ''Imported by sqoop on '||to_char(sysdate,'yyyy/mm/dd HH:mm:ss')||' '' '
||'ROW FORMAT DELIMITED '
||'  FIELDS TERMINATED BY ''^A'' ' 
|| 'LINES TERMINATED BY ''\n'' ' 
||'STORED AS INPUTFORMAT  '
 ||'''org.apache.hadoop.mapred.TextInputFormat'' '
||'OUTPUTFORMAT '
 || '''org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'' '
||'LOCATION '
 || '''hdfs://ns/DATA/ODS/'||table_name||''' '
||'TBLPROPERTIES ( '
 ||'''COLUMN_STATS_ACCURATE''=''false'',  '
 ||'''last_modified_by''=''hadoop''  )'

from dba_tab_columns where owner ='HSRPT' and table_name  ='TMGPFSALEDETAIL_RPT'
group by table_name

查看hive表

hive (ods)> show tables;
OK
tab_name
tmgpfreturndetail_rpt
tmgpfreturnform_rpt
tmgpfsaledetail_rpt
tmgpfsaleform_rpt
Time taken: 0.043 seconds, Fetched: 4 row(s)

3.建立Azkaban job

START:

type=command
command=echo `date`  "jobs start" 

TMGPFSALEDETAIL_RPT:

type=command
command=sqoop job -exec TMGPFSALEDETAIL_RPT
dependencies=start

TMGPFRETURNDETAIL_RPT:

type=command
command=sqoop job -exec TMGPFRETURNDETAIL_RPT
dependencies=start

TMGPFSALEFORM_RPT:

type=command
command=sqoop job -exec TMGPFSALEFORM_RPT
dependencies=start

TMGPFRETURNFORM_RPT:

type=command
command=sqoop job -exec TMGPFRETURNFORM_RPT
dependencies=start

SUCCESS:

type=command
command=echo `date` "complete...TMGPFRETURNFORM_RPT...TMGPFSALEFORM_RPT...TMGPFSALEDETAIL_RPT...TMGPFRETURNDETAIL_RPT...syc" 
dependencies=TMGPFRETURNFORM_RPT,TMGPFSALEFORM_RPT,TMGPFSALEDETAIL_RPT,TMGPFRETURNDETAIL_RPT

打包到zip文件下上传到Azkaban 的project下

4.执行

定时:点击schedule

查看执行:

5.查看数据

进到hdfs查看:

[hadoop@oracletest01 ~]$ hdfs dfs -du -h /DATA/ODS/
21/04/06 16:30:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3.6 G    7.1 G    /DATA/ODS/TMGPFRETURNDETAIL_RPT
936.1 M  1.8 G    /DATA/ODS/TMGPFRETURNFORM_RPT
73.2 G   146.4 G  /DATA/ODS/TMGPFSALEDETAIL_RPT
15.0 G   30.1 G   /DATA/ODS/TMGPFSALEFORM_RPT

进到hive查看数据

hive (ods)> select count(1) from TMGPFSALEFORM_RPT;
Query ID = hadoop_20210406150303_0b672820-3192-41a4-a5d1-f5abf46fdd31
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2021-04-06 15:06:35,356 Stage-1 map = 0%,  reduce = 0%
2021-04-06 15:06:38,385 Stage-1 map = 100%,  reduce = 0%
2021-04-06 15:07:38,635 Stage-1 map = 100%,  reduce = 0%
2021-04-06 15:08:39,443 Stage-1 map = 100%,  reduce = 0%
2021-04-06 15:09:21,386 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local130132522_0002
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 584384636688 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
_c0
56550606
Time taken: 167.557 seconds, Fetched: 1 row(s)
hive (ods)> show tables;
OK
tab_name
tmgpfreturndetail_rpt
tmgpfreturnform_rpt
tmgpfsaledetail_rpt
tmgpfsaleform_rpt
Time taken: 0.043 seconds, Fetched: 4 row(s)

posted on 2022-05-16 17:54  xc川  阅读(1)  评论(0)    收藏  举报