企业发票异常分析---导入,清洗

今天做了企业发票异常分析的作业成功地将数据导入到hive数据仓当中,并对数据进行了初步的清洗

流程如下:

 

一.三个样表文件中的数据导入HIVE数据仓库中

 

先建三个表:

 

create table xxfpb(

 

hydm string,

 

xf_id string,

 

djzclx_dm string,

 

kydjrq string,

 

xgrq string,

 

label string,

 

fp_nid string,

 

je double,

 

se double,

 

jshj double,

 

kpyf string,

 

kprq string,

 

zfbz string

 

)row format delimited fields terminated by ',';

 

其余两个表的建表语句类似

 

然后进行数据清洗:

insert overwrite table nsrxx select substring(hydm,2,length(hydm)-1) as hydm, nsr_id as nsr_id,djzclx_dm as djzclx_dm,kydjrq as kydjrq,xgrq as xgrq,substring(label,1,length(label)-1) as label from nsrxx;

insert overwrite table zzsfp_hwmx select substring(fp_nid,2,length(fp_nid)-1) as fp_nid, date_kry as date_kry,hwmc as hwmc,ggxh as ggxh,dw as dw,sl as sl,dj as dj,je as je,se as se,substring(spbm,1,length(spbm)-1) as spbm from zzsfp_hwmx;

清洗的目的是将三个表的前括号和后括号去掉

清洗完成的结果截图:

 

 

 

 

 

posted @ 2021-11-13 16:52  潘福龙  阅读(66)  评论(0编辑  收藏  举报