HIVE - 随笔分类 - happygril3

优化

摘要：1.fetch抓取全局查找，字段查找,limit查找都不走mapreduceset hive.fetch.task.conversion=more; 2.本地模式小数据集查询，为查询触发执行任务消耗的时间可能会比实际job执行时间大得多set hive.exec.mode.local.auto= 阅读全文

posted @ 2020-12-19 17:50 happygril3 阅读(162) 评论(0) 推荐(0)

自定义函数

摘要：1.UDF(user-defined function) 一进一出（一行数据） 1.1 定义函数（1）继承 org.apache.hadoop.hive.ql.exec.UDF （2）需要实现evaluate函数，evaluate()支持重载（3）UDF必须有返回值类型，可以返回null，但不能阅读全文

posted @ 2020-12-09 17:13 happygril3 阅读(75) 评论(0) 推荐(0)

行列互换

摘要：1.concat：将同一行数据拼接 drop table student; create table if not exists student ( name string, orderdate string, cost int, sex string, dep string, class stri 阅读全文

posted @ 2020-12-09 16:35 happygril3 阅读(178) 评论(0) 推荐(0)

窗口函数

摘要：1、over()窗口函数的语法结构分析函数 over(partition by 列名 order by 列名 rows between 开始位置 and 结束位置) 分析函数 over(distribute by 列名 sort by 列名 rows between 开始位置 and 结束位置) 阅读全文

posted @ 2020-12-07 15:58 happygril3 阅读(226) 评论(0) 推荐(0)

数据导出

摘要：1.insert 将查询结果直接导出到本地 insert overwrite local directory "kg/qiaoruihua/hive/emp" select * from student; insert overwrite local directory "kg/qiaoruihua 阅读全文

posted @ 2020-12-05 15:55 happygril3 阅读(71) 评论(0) 推荐(0)

数据导入

摘要：1.从外部文件系统向表中加载数据 load [overwrite] into load data [local] inpath "" [overwrite] into table table_name [partition(col_name="")] local:表示从本地加载数据到HIVE表，否则阅读全文

posted @ 2020-12-05 15:29 happygril3 阅读(96) 评论(0) 推荐(0)

表

摘要：1.创建表 create [external] table [if not exists] table_name (col_name data_type) [partitioned by col_name data_type] [clustered by col_name,col_name] [so 阅读全文

posted @ 2020-12-05 14:20 happygril3 阅读(72) 评论(0) 推荐(0)

分区表和分桶表

摘要：1. 分区表静态分区(Static Partitioning)动态分区（Dynamic Partitioning）分区创建数据插入分区之前，需要手动创建每个分区根据表的输入数据动态创建分区适用场景需要提前知道所有分区。适用于分区定义得早且数量少的用例有很多分区，无法提前预估新分区，动态阅读全文

posted @ 2020-12-05 13:50 happygril3 阅读(505) 评论(0) 推荐(0)

内部表和外部表

摘要：1. 内部表(管理表）：默认是内部表，数据存储默认在配置项hive.metastore.warehouse.dir(/user/hive/warehouse)数据由Hive管理,drop删除时，元数据和实际数据都会被删除 2. 外部表数据不由Hive管理,drop删除时，只删除元数据，不删除实际阅读全文

posted @ 2020-12-05 13:48 happygril3 阅读(131) 评论(0) 推荐(0)

数据库

摘要：1.创建数据库 --创建数据库 create database db_hive; --避免已存在 create database if not exists db_hive; --指定HDFS位置,默认"/user/hive/warehouse" create database db_hive lo 阅读全文

posted @ 2020-12-05 12:47 happygril3 阅读(83) 评论(0) 推荐(0)

数据类型

摘要：Hive支持两种数据类型，一类叫原子数据类型，一类叫复杂数据类型。 1. 基本数据类型 hive不支持日期类型，在hive里日期都是用字符串来表示的，而常用的日期格式转化操作则是通过自定义函数进行操作。 2. 复杂数据类型复杂数据类型包括数组（ARRAY）、映射（MAP）和结构体（STRUCT）阅读全文

posted @ 2020-12-05 12:30 happygril3 阅读(125) 评论(0) 推荐(0)

HIVE安装

摘要：1.安装hive 1.1 修改文件 mv apache-hive-2.3.0-bin hive-2.3.0 1.2 修改/opt/module/hive-2.3.0/conf目录下的hive-env.sh.template为hive-env.sh HADOOP_HOME=/opt/module/ha 阅读全文

posted @ 2020-12-03 16:50 happygril3 阅读(42) 评论(0) 推荐(0)

基本概念

摘要：1.基本概念 hive是由facebook开源用于解决海量结构化日志的数据统计 hive是基于Hadoop得一个数据仓库工具，可以将结构化的数据文件映射为一张表，并提供类sql查询功能本质：将HQL转化为mapreduce程序（1）hive处理的数据存储在HDFS （2）hive分析数据的底层的阅读全文

posted @ 2020-12-02 18:43 happygril3 阅读(92) 评论(0) 推荐(0)

数据压缩和存储

摘要：1.压缩压缩技术能够有效减少底层存储系统（HDFS）读写字节数。压缩提高了网络带宽和磁盘空间的效率。鉴于磁盘I/O和网络带宽是Hadoop的宝贵资源，数据压缩对于节省资源、最小化磁盘I/O和网络传输非常有帮助。压缩Mapreduce的一种优化策略：通过压缩编码对Mapper或者Re 阅读全文

posted @ 2020-11-25 10:03 happygril3 阅读(637) 评论(0) 推荐(0)

多列去重

摘要：province city scoreshanxi lvliang 1shanxi lvliang 2shanxi lvliang 3shanxi lvliang 4shanxi lvliang 5shanxi yuncheng 6shanxi yuncheng 7shanxi yuncheng 8 阅读全文

posted @ 2019-07-23 10:54 happygril3 阅读(108) 评论(0) 推荐(0)

index 索引

摘要：1.创建表 drop table if exists kg_fk_user; create table kg_fk_user( id int, name string ) row format delimited fields terminated by "," stored as textfile 阅读全文

posted @ 2019-07-01 15:54 happygril3 阅读(100) 评论(0) 推荐(0)

join优化

摘要：1.left outer join先执行连接操作，再将结果通过WHERE语句进行过滤 select s.ymd,s.symbol,s.price_close,d.dividend from stocks s left outer join dividends d on s.ymd=d.ymd and 阅读全文

posted @ 2019-06-30 12:08 happygril3 阅读(104) 评论(0) 推荐(0)

join

摘要：1. 内连接join（默认内连接）内连接不支持的查询： on a.ymd<=b.ymd 内连接不支持的查询： on 中使用or select a.ymd,a.price_close,b.price_close from stocks a join stocks b on a.ymd=b.ymd w 阅读全文

posted @ 2019-06-30 11:58 happygril3 阅读(174) 评论(0) 推荐(0)

group_by

摘要：1.按照一个列或者多个列对数据分组 2.对每个组进行聚合操作 3. 对聚合后的结果进行判断 1. select avg(score) as score from teacher 2. select grade, avg(score) as avg_score from teacher group b 阅读全文

posted @ 2019-06-30 10:57 happygril3 阅读(93) 评论(0) 推荐(0)

like

摘要：select sname from teacher where sname like "q%"select sname from teacher where sname like "%an%"select sname from teacher where sname like "%ang" show 阅读全文

posted @ 2019-06-23 18:57 happygril3 阅读(107) 评论(0) 推荐(0)

happygril3

随笔分类 - HIVE

导航

公告