2.3--Hive - 随笔分类 - 智能先行者

Spark2 Dataset分析函数--排名函数row_number,rank,dense_rank,percent_rank

摘要：row_number,rank,dense_rank,percent_rank 阅读全文

posted @ 2016-11-25 18:34 智能先行者阅读(8080) 评论(0) 推荐(0)

摘要：val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by Cube(gender,children) order by 1,2") df6.show +------+--------+--------+--------+----------+ ... 阅读全文

posted @ 2016-11-25 18:23 智能先行者阅读(3327) 评论(1) 推荐(0)

Spark2 Dataset统计指标：mean均值，variance方差，stddev标准差，corr(Pearson相关系数)，skewness偏度，kurtosis峰度

摘要：mean均值，variance方差，stddev标准差，corr(Pearson相关系数)，skewness偏度，kurtosis峰度阅读全文

posted @ 2016-11-25 17:55 智能先行者阅读(9419) 评论(0) 推荐(0)

Spark2 Dataset之collect_set与collect_list

摘要：collect_set去除重复元素；collect_list不去除重复元素select gender, concat_ws(',', collect_set(children)), concat_ws(',', collect_list(children)) from Affairs group b 阅读全文

posted @ 2016-11-25 17:19 智能先行者阅读(14456) 评论(0) 推荐(2)

Hive desc

摘要：Describe Database Describe Table/View/Column hive> DESCRIBE user_info_bucketed; user_id bigint firstname string lastname string ds string # Partition 阅读全文

posted @ 2015-07-25 21:12 智能先行者阅读(4542) 评论(0) 推荐(1)

Hive FUNCTIONS函数

摘要：hive> SHOW FUNCTIONS;!!=%&*+-/===>>=^absacosadd_monthsandarrayarray_containsasciiasinassert_trueatanavgbase64betweenbincasecbrtceilceilingcoalescecoll... 阅读全文

posted @ 2015-07-25 20:45 智能先行者

Hive show

摘要：Show Tables Show Partitions Show Table Properties Show Create Table Show Indexes Show Columns Show Functions Show Conf 阅读全文

posted @ 2015-07-25 10:54 智能先行者

MySQL行列转换拼接

摘要：mysql> select TBL_ID,CREATE_TIME,LAST_ACCESS_TIME,TBL_NAME,TBL_TYPE from TBLS; +--------+-------------+------------------+----------------------+---------------+ | TBL_ID | CREATE_TIME | LAST_ACCESS_... 阅读全文

posted @ 2015-07-20 23:52 智能先行者

MySQL字符串连接

摘要：mysql> select concat('Hadoop:','Hive:','Spark#','HBase;',TBL_TYPE,'{}',SD_ID) from TBLS; | Hadoop:Hive:Spark#HBase;MANAGED_TABLE{}6 | | Hadoop:Hive:Sp 阅读全文

posted @ 2015-07-20 23:37 智能先行者阅读(499) 评论(0) 推荐(0)

Hive桶列BucketedTables

摘要：The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be ... 阅读全文

posted @ 2015-07-20 22:54 智能先行者

Hive JOIN使用详解

摘要：转自http://shiyanjun.cn/archives/588.htmlHive是基于Hadoop平台的，它提供了类似SQL一样的查询语言HQL。有了Hive，如果使用过SQL语言，并且不理解Hadoop MapReduce运行原理，也就无法通过编程来实现MR，但是你仍然可以很容易地编写出特定... 阅读全文

posted @ 2015-07-17 22:29 智能先行者

查询hadoop参数变量

摘要：[hadoop@master hadoop]$ hive -S -e 'set -v'|grep querylog|grep -E -v 'CLASSPATH|class'hive.querylog.enable.plan.progress=truehive.querylog.location=/h... 阅读全文

posted @ 2015-07-12 20:26 智能先行者

Hive格式化输出数据库和表详细信息

摘要：hive> desc database extended wx_test; OK wx_test hdfs://ns1/user/hive/warehouse/wx_test.db hadoop USER {t_date=2015-06-21, creator=wx} Time taken: 0.027 seconds, Fetched: 1 row(s) hive> desc form... 阅读全文

posted @ 2015-07-12 20:17 智能先行者

Hive行转多列

摘要：LATERAL VIEW explode 阅读全文

posted @ 2015-06-24 22:50 智能先行者

Hive sql语法详解

摘要：Hive 是基于Hadoop 构建的一套数据仓库分析系统，它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据，可以将结构化的数据文件映射为一张数据库表，并提供完整的SQL查询功能，可以将SQL语句转换为MapReduce任务进行运行，通过自己的SQL 去查询分析需要的... 阅读全文

posted @ 2015-03-29 16:09 智能先行者

GROUPING SETS与GROUP_ID

摘要：SELECT E.DEPARTMENT_ID DID, E.JOB_ID JOB, E.MANAGER_ID MID, SUM(E.SALARY) SUM_SAL, COUNT(E.EMPLOYEE_ID) CNT, GROUP_ID() GG FROM EMPLOYEES E WHERE E.JOB_ID IN ('S... 阅读全文

posted @ 2014-12-21 15:59 智能先行者阅读(1547) 评论(0) 推荐(0)

智能先行者

随笔分类 - 2.3--Hive

公告