欣欣姐

2021年10月14日

select a.owner, a.table_name, a.column_name, a.data_type, d.constraint_type, a.num_nulls from all_tab_columns a left join ( select b.owner, b.TABLE_NA Read More

posted @ 2021-10-14 11:20 欣欣姐 Views(1002) Comments(0) Diggs(0)

2021年9月2日

在linux环境下自动执行python脚本

有时候编辑的py文件，需要进行自动执行时，可以用以下方式进行定时 00 09 * * * /usr/local/bin/python3 /udata/ubi/uenbi_py/trade_all_daily.py >>/udata/ushks/pythoncode/pylog.log 2>&1 Read More

posted @ 2021-09-02 17:33 欣欣姐 Views(184) Comments(0) Diggs(0)

2021年8月6日

oracle查询表字段和注释

查看某个某个字段的注释名等 select a.owner as 用户名 ,a.TABLE_NAME as 表名 ,b.COMMENTS as 表注释名 ,a.COLUMN_NAME as 字段名 ,a.comments as 字段注释 from dba_col_comments a join dba Read More

posted @ 2021-08-06 17:41 欣欣姐 Views(625) Comments(0) Diggs(0)

Hive 组内排序，计算TopN

需求场景，组内排序，例如统计某个用户前10次消费金额，如求某个相同id或组内的top值步骤一，创建测试表 create table tmp_partition_test ( name string, subject string, score int ) 步骤二，插入测试数据 insert int Read More

posted @ 2021-08-06 15:17 欣欣姐 Views(467) Comments(0) Diggs(0)

2021年8月4日

hive group by 导致的数据倾斜问题

Group By 默认情况下，Map阶段同一Key数据分发给一个reduce，当一个key数据过大时就倾斜了。但并不是所有的聚合操作都需要在Reduce端完成，很多聚合操作都可以先在Map端进行部分聚合，最后在Reduce端得出最终结果。 1）开启Map端聚合参数设置（1）是否在Map端进行聚合 Read More

posted @ 2021-08-04 19:10 欣欣姐 Views(601) Comments(0) Diggs(0)

Hive 查看mapjoin日志过程

MapJoin 如果不指定 MapJoin 或者不符合 MapJoin 的条件，那么Hive解析器会将 Join 操作转换成 Common Join，即：在 Reduce 阶段完成 join。容易发生数据倾斜。可以用 MapJoin 把小表全部加载到内存在 map 端进行 join ，避免 redu Read More

posted @ 2021-08-04 18:50 欣欣姐 Views(333) Comments(0) Diggs(0)

2021年7月29日

hive 定时执行脚本

hive中，需要每天定时跑一些sql的脚本或者其他脚本，可以通过调用sh脚本 1.新建一个sh文件 [root@master log]# vim wh_hive_daily.sh 2.添加以下内容（可以直接复制，然后修改sql里面的内容即可） #!/bin/bash APP=uiopdb hive Read More

posted @ 2021-07-29 16:48 欣欣姐 Views(1857) Comments(0) Diggs(0)

2021年7月28日

oracle for loop 循环用法

据两个例子，第一个数层级关系的递归，用循环 begin for orgId in (select org_id from DWSDATA.T_AGENT_ORG_ID group by agent_id ) loop insert into ken.all_agent(agent_id,all_c Read More

posted @ 2021-07-28 11:00 欣欣姐 Views(2471) Comments(0) Diggs(0)

2021年7月22日

There are 1557 missing blocks. The following files may be corrupted:

进入到,50070页面报错，There are 1557 missing blocks. The following files may be corrupted: 步骤1，检查文件缺失情况执行一下代码， hdfs fsck / -list-corruptfileblocks，查看哪些数据块损坏 Read More

posted @ 2021-07-22 18:06 欣欣姐 Views(657) Comments(0) Diggs(0)

2021年7月20日

Hive insert into 竟然覆盖了原来的数据

问题：在使用hive的insert into 往表里插入数据时，却发现原来的数据被覆盖了。如下图，如论insert 语句执行多少次，只会有最新的一条数据。（情况跟overwrite一样）经过多次查找原因，才知道时因为飘号的原因，去掉飘号就可以了总结：所以要么用户名和表名分别都加飘号，要么就都 Read More

posted @ 2021-07-20 16:36 欣欣姐 Views(2686) Comments(0) Diggs(0)

公告