Hive - 随笔分类 - yuerspring

zk kafka mariadb scala flink integration

摘要：zk kafka mariadb scala flink integrationI do not want to write this paper in the beginning , put the codes onto github.com/git.jd.com,... 阅读全文

posted @ 2019-01-17 08:50 yuerspring 阅读(271) 评论(0) 推荐(0)

Sqoop 调试错误 from hive to mysql

摘要：1. mysql jdbc 错误需要驱动 2. mysql 服务器 ip 错误即便是本地也需要用域名或是 ip 不能用 localhost 等 3. 数据长度问题，hive 乳沟大于 mysql 错误4. hive 跟 mys... 阅读全文

posted @ 2018-12-21 20:16 yuerspring 阅读(264) 评论(0) 推荐(0)

hive 小文件数据合并

摘要：hive 数据有时候需要进行数据合并 #!/bin/bashhadoop jar /software/servers/bdp_tools/mergefiles-1.7.jar -u lzo -p hdfs://ns1/user/dd_edw/adm.db/table_... 阅读全文

posted @ 2018-12-06 14:07 yuerspring 阅读(1158) 评论(0) 推荐(0)

The most important parameters of spark env when you using spark run data things

摘要：The most important parameters of spark env when you using spark run data thingsIn my memory I always confused by these parameters ,s... 阅读全文

posted @ 2018-11-02 16:40 yuerspring 阅读(181) 评论(0) 推荐(0)

懵逼，同事前几天问我一个udf 问题，还是 Python 格式的

摘要：以前所有的Hive 函数都是 Java 或是 Scala 语言写的，Python 没有弄个，看到Python 格式的 code 懵逼了，非常懵逼今天在看 Python 正则表达式偶遇 Python udf 的一个 case ，不过 Python 484... 阅读全文

posted @ 2018-10-31 18:06 yuerspring 阅读(333) 评论(0) 推荐(0)

Hive error log :FAILED: Execution Error, return code 137 from org.apache.hadoop.hive.ql.exec.mr.Mapr

摘要：From below log is not easy to find the root reason ,any body know that ,thx2018-10-22 03:45:41 INFO 2018-10-22 03:45:41,651 Stage-2(jo... 阅读全文

posted @ 2018-10-22 09:05 yuerspring 阅读(1417) 评论(0) 推荐(0)

Hive 关联主键数据倾斜的一种处理方法

摘要：记得之前用过一种办法，concat(a.col,'-',ceil(rand()*100)%20))，将记录数特别多的关联字段取模20，得到新字段a.col2；另外维护一张有20条记录的小表，小表的数据从0到19，用b表的关联字段关联这张小表得到新的字段b.col2，然... 阅读全文

posted @ 2018-10-18 17:53 yuerspring 阅读(494) 评论(0) 推荐(0)

hive null 值倾斜数据处理方法

摘要：on case when a.user_id is null then concat(‘jd-hive’,rand() ) else a.user_id end = b.user_id;如果两个表join 时关联键有众多null 值，null值数据会被分发到... 阅读全文

posted @ 2018-10-16 16:05 yuerspring 阅读(539) 评论(0) 推荐(1)

Hive 行列转换

摘要：在京东众多业务中，促销业务充满了复杂性和挑战性，因为业务的灵活性，很多数据都存储成xml和json格式数据，这就要求下游数据分析师们需要对其做解析后方可使用。在众多操作中，有一种是需要对数据做行列转换操作。数据结构：create external table jd... 阅读全文

posted @ 2018-09-26 20:16 yuerspring 阅读(263) 评论(0) 推荐(0)

Hive 查看partition 以及msck 修复分区

摘要：# check table 的 partitionhive> show partitions table_name ;如果是外部表，不小心把表给删除了，可以适用下命令重新关联表和数据[MSCK REPAIR TABLE] 全量修复分区hive> msck repa... 阅读全文

posted @ 2018-07-23 17:26 yuerspring 阅读(1486) 评论(0) 推荐(0)

Hive 解析 json,json array

摘要：在大数据处理中经常遇到业务端发送json 数据到 table 里的情况，这个需要数据开发工程师能够准备对json string 进行解析，并重新定义新表的结构。在网络上看到很多网友提到 get_json_object ,json_tuple 的使用和案例 ... 阅读全文

posted @ 2018-07-18 14:13 yuerspring 阅读(1259) 评论(0) 推荐(0)

Hive SQL 将一个行转化成N多列

摘要：select explode(Array('row1','row2','...','rown'))Result col_name row1row2...rown 阅读全文

posted @ 2018-06-16 16:54 yuerspring 阅读(1388) 评论(0) 推荐(0)

hive 在统计汇总时候，如何去重

摘要：create table xxxx.test_collect_list_set(first_level_directory int ,second_level_directory int ,third_level_directory int ,order_id in... 阅读全文

posted @ 2018-04-13 12:33 yuerspring 阅读(1229) 评论(0) 推荐(0)

Hive udtf 报错 java.lang.String cannot be cast to java.lang.Integer

摘要：Error ：Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java... 阅读全文

posted @ 2018-03-08 09:14 yuerspring 阅读(833) 评论(0) 推荐(0)

hive 查看表结构语法

摘要：show create table yourtablename 阅读全文

posted @ 2018-01-29 09:18 yuerspring 阅读(486) 评论(0) 推荐(0)

Hive 以及mysql 中如何做except 数据操作

摘要：在db2 和 oracle 中，当我们想知道两张结构相同的table 中，数据差异时候可以采用如下sql ：select * from table1 except select * from table2上面语句求出来的就是数据在table1 而不再table2... 阅读全文

posted @ 2017-11-07 13:17 yuerspring 阅读(1319) 评论(0) 推荐(0)

Hive性能调优内容来自网络

摘要：Hive的一般学习者和培训者在谈性能优化的时候一般都会从语法和参数这些雕虫小技的角度谈优化,而不会革命性的优化Hive的性能,产生这种现象的原因有:1,历史原因和思维定势:大家学习SQL的时候一般都是就单机DB,这个时候你的性能优化技巧确实主要是SQL语法和参数调优;... 阅读全文

posted @ 2017-03-07 21:34 yuerspring 阅读(146) 评论(0) 推荐(0)

java jdbc 操作 hive 建表 load 数据

摘要：// 需要引入 hadoop & hive jar import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.Stateme... 阅读全文

posted @ 2017-02-25 19:19 yuerspring 阅读(3718) 评论(0) 推荐(0)

Hive Partition 操作

摘要：create external table demo (userid int ,name string ,address string)comment 'demo'partitioned by (txdate string ,txhour string)row for... 阅读全文

posted @ 2017-02-25 19:16 yuerspring 阅读(195) 评论(0) 推荐(0)

大数据Hive 面试以及知识点

摘要：1 hive表关联查询，如何解决数据倾斜的问题?倾斜原因：map输出数据按key Hash的分配到reduce中，由于key分布不均匀、业务数据本身的特、建表时考虑不周、等原因造成的reduce 上的数据量差异过大。1)、key分布不均匀;2)、业务数据本身的特性;... 阅读全文

posted @ 2016-10-14 22:35 yuerspring 阅读(213) 评论(0) 推荐(0)

yuerspring

随笔分类 - Hive

公告