2018 年 7月 25 日随笔档案 - chenzechao

2018年7月25日

摘要： ### 切换到脚本所在目录bin=$(cd `dirname $0`;pwd) cd ${bin} ### 删除文件名乱码文件 find . -inum 16482370 -exec mv {} ppp \; ### 打印指定行数 sed -n '777172,793920p' kafkaconnect.log 阅读全文

posted @ 2018-07-25 16:02 chenzechao 阅读(141) 评论(0) 推荐(0)

HIVE sql取中文

摘要： select regexp_replace(str,'[^\\u4e00-\\u9fa5]','') as str1 from ( select 'test测试test' as str ) t ; 阅读全文

posted @ 2018-07-25 16:00 chenzechao 阅读(952) 评论(0) 推荐(0)

并行执行hive脚本

摘要： 1 ### 模板脚本存放路径(无需修改) 2 cd /tmp/fix_data/tmp_wjj_20180322_01 3 ### 脚本名称 4 script=tmp_wjj_20180322_01 5 ### 开始日期(包括当月/天) 6 etl_dt_start='2017-09-01' 7 ### 结束日期(不包括当月/天) 8 etl_dt_end='2016-12-... 阅读全文

posted @ 2018-07-25 15:58 chenzechao 阅读(369) 评论(0) 推荐(0)

hive数据导入导出

摘要： 1 -- 从本地文件加载数据： 2 LOAD DATA LOCAL INPATH '/home/hadoop/input/ncdc/micro-tab/sample.txt' OVERWRITE INTO TABLE records; 3 load data local inpath '/home/hive/partitions/files' into table logs partitio... 阅读全文

posted @ 2018-07-25 15:50 chenzechao 阅读(969) 评论(0) 推荐(0)

hive distcp数据同步

摘要： 1 -- 同步HDFS数据(shell执行) 2 hadoop distcp \ 3 -Dmapred.job.queue.name=queue_name \ 4 -update \ 5 -skipcrccheck hdfs://hdfs01/user/hive/warehouse/db_name1.db/table_name \ 6 /user/hive/warehouse/db_name... 阅读全文

posted @ 2018-07-25 14:34 chenzechao 阅读(681) 评论(0) 推荐(0)

hive参数设置

摘要： 1 -- 设置hive的计算引擎为spark 2 set hive.execution.engine=spark; 3 4 -- 修复分区 5 set hive.msck.path.validation=ignore; 6 msck repair table sub_ladm_app_click_day_cnt; 7 8 -- 打印表头 9 set hive.cli.p... 阅读全文

posted @ 2018-07-25 14:31 chenzechao 阅读(5958) 评论(0) 推荐(0)

chenzechao

公告