happygril3

缓存

摘要： 1.1. RDD的缓存 Spark速度非常快的原因之一，就是在不同操作中可以在内存中持久化或缓存数据集。当持久化某个RDD后，每一个节点都将把计算的分片结果保存在内存中，并在对此RDD或衍生出的RDD进行的其他动作中重用。这使得后续的动作变得更加迅速。 RDD相关的持久化和缓存，是Spark最重要阅读全文

posted @ 2020-12-28 16:37 happygril3 阅读(98) 评论(0) 推荐(0)

数据结构

摘要： 1.RDD 1.1定义 1.1.1 数据集存储数据的计算逻辑 1.1.2 分布式数据得来源，数据得存储 1.1.3 弹性 (1)血缘（依赖关系）：spark可以通过特殊的处理方案简化依赖关系(2)计算：Spark是基于内存的，性能特别高，可以和键盘灵活切换(3)分区：Spark在创建默认分区后，阅读全文

posted @ 2020-12-28 10:22 happygril3 阅读(70) 评论(0) 推荐(0)

分区数

摘要： package spark2020 import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object RddCreate { def main(args:Array[String]):Un 阅读全文

posted @ 2020-12-25 15:51 happygril3 阅读(179) 评论(0) 推荐(0)

优化

摘要： 1.fetch抓取全局查找，字段查找,limit查找都不走mapreduceset hive.fetch.task.conversion=more; 2.本地模式小数据集查询，为查询触发执行任务消耗的时间可能会比实际job执行时间大得多set hive.exec.mode.local.auto= 阅读全文

posted @ 2020-12-19 17:50 happygril3 阅读(163) 评论(0) 推荐(0)

运行jar包

摘要： 1.本地目录 hadoop jar /home/kg/phone_local.jar corina.wordCount.wordLocal.WordcountDriver /home/kg/hello.txt /home/kg/result package corina.wordCount.word 阅读全文

posted @ 2020-12-15 17:34 happygril3 阅读(198) 评论(0) 推荐(0)

自定义函数

摘要： 1.UDF(user-defined function) 一进一出（一行数据） 1.1 定义函数（1）继承 org.apache.hadoop.hive.ql.exec.UDF （2）需要实现evaluate函数，evaluate()支持重载（3）UDF必须有返回值类型，可以返回null，但不能阅读全文

posted @ 2020-12-09 17:13 happygril3 阅读(77) 评论(0) 推荐(0)

行列互换

摘要： 1.concat：将同一行数据拼接 drop table student; create table if not exists student ( name string, orderdate string, cost int, sex string, dep string, class stri 阅读全文

posted @ 2020-12-09 16:35 happygril3 阅读(179) 评论(0) 推荐(0)

窗口函数

摘要： 1、over()窗口函数的语法结构分析函数 over(partition by 列名 order by 列名 rows between 开始位置 and 结束位置) 分析函数 over(distribute by 列名 sort by 列名 rows between 开始位置 and 结束位置) 阅读全文

posted @ 2020-12-07 15:58 happygril3 阅读(226) 评论(0) 推荐(0)

数据导出

摘要： 1.insert 将查询结果直接导出到本地 insert overwrite local directory "kg/qiaoruihua/hive/emp" select * from student; insert overwrite local directory "kg/qiaoruihua 阅读全文

posted @ 2020-12-05 15:55 happygril3 阅读(71) 评论(0) 推荐(0)

数据导入

摘要： 1.从外部文件系统向表中加载数据 load [overwrite] into load data [local] inpath "" [overwrite] into table table_name [partition(col_name="")] local:表示从本地加载数据到HIVE表，否则阅读全文

posted @ 2020-12-05 15:29 happygril3 阅读(96) 评论(0) 推荐(0)

导航

公告