Hive - 随笔分类 - tneduts

hive3.12本地安装测试简记

摘要：前置条件：依赖jdk,请下载jdk8并解压依赖hadoop,请首先下载hadoop并解压 step1/hive metastore建议使用mysql进行存储，安装mysql这一步忽略，请自行安装，如果使用docker,可以使用一句docker run解决。创建hive用户密码和数据库。mysql 阅读全文

posted @ 2020-01-07 08:14 tneduts 阅读(844) 评论(1) 推荐(0)

最近常用的命令记录

摘要：1。清除前导空格 %s/^\s\+//g 注意这里的加号也需要\ 2。找到目录下大于100m的文件 find . -type f -size +100M 3。找到目录下昨天的文件 find . -type f -mtime -1 4。找到十分钟前创建的文件 find / -newerct "10 m 阅读全文

posted @ 2019-12-23 23:13 tneduts 阅读(409) 评论(0) 推荐(0)

maxmind geoip2使用笔记

摘要：客户需求如下，nginx的访问日志中ip，匹配出对应的国家，省份和城市，然后给我了一个maxmind的连接参考。查找资料，有做成hive udf的使用方式，我们项目中一直使用 waterdrop 来做数据处理，所以决定开发一个 waterdrop的插件。关于这个功能,waterdrop本身提供阅读全文

posted @ 2019-12-23 22:23 tneduts 阅读(3202) 评论(0) 推荐(0)

hdp3.1 hive 3.0的使用记录

摘要：spark-sql如何访问hive3.1中的内部表阅读全文

posted @ 2019-05-11 11:59 tneduts 阅读(3620) 评论(1) 推荐(1)

mysql调优小记

摘要：对于INNODB,主键就是聚集索引,如果没有主键定义,则第一个唯一非空索引被作为聚集索引.如果没有主键也没有合适的唯一索引,那么innodb内部会生成一个隐藏的主键作为聚集索引,这个隐藏的主键类似一个自增的id(int).删除与重新添加主键alter table tbname drop primar 阅读全文

posted @ 2017-08-25 21:16 tneduts 阅读(229) 评论(1) 推荐(0)

spark-sql做ETL时遇到的两个问题

摘要：项目中使用spark-sql来作ETL，遇到两个问题，记录一下。问题1： spark-sql –master yarn –hiveconf load_date=`date –d ..` -e ‘insert overwrite table tbl(.) select distinct * from tbl” 在hdfs上这个表所在的目录下面会产生很多的类似.hive-s... 阅读全文

posted @ 2017-08-10 13:41 tneduts 阅读(1296) 评论(0) 推荐(0)

Hive的metastore

摘要：hive --service metastore 默认端口是9083 <property> <name>hive.metastore.uris</name> <value>thrift://hiveserver1:9083</value> </property> 在连接hive的客户端,如spark 阅读全文

posted @ 2017-04-18 23:13 tneduts 阅读(394) 评论(0) 推荐(0)

摘要：hive在跑数据时经常会出现数据倾斜的情况，使的作业经常reduce完成在99%后一直卡住，最后的１%花了几个小时都没跑完，通过YARN的管理界面配合日志，可以清楚其中的具体原因，这种情况就很可能是数据倾斜的原因，解决方法要根据具体情况来选择具体的方案 1.如果你知道某些字段造成的倾斜，可以把这些字段抽出来单独处理，这样的话，MR会多分配几个实例，提高执行速度。 2. set h... 阅读全文

posted @ 2017-01-11 22:52 tneduts 阅读(568) 评论(0) 推荐(0)

Apache Drill Install and Test

摘要：Drill doc, https://drill.apache.org/docs/hive-storage-plugin/ 发现在国内访问的时候有些标签反应还是很慢，因为它访问了gooleapi的缘故吧。故连接vpn后，访问效果好了很多。尝试Drill的原因是，在公司开发的项目一直用Hive，但在测试阶段，有些交互式查询想快些获取到结果，但是hive就是不给力，而且当前的版本不支持使用spa... 阅读全文

posted @ 2015-08-25 22:16 tneduts 阅读(855) 评论(0) 推荐(0)

Hive beeline update

摘要：Hive cli vs beelineThe primary difference between the two involves how the clients connect to Hive.The Hive CLI connects directly to the Hive Driver a... 阅读全文

posted @ 2015-07-24 07:27 tneduts 阅读(1122) 评论(0) 推荐(0)

beeline vs hive cli

摘要：近期，大数据开发环境升级为cloudera 5.3. 配套的hive版本升级为0.13.1.可以使用心仪已久的分析开窗函数了。但在使用的过程中发现一些问题，仅记于此。1.在使用hive命令的时候，发现自动跳转至beeline客户端，并且提示hive cli已经过期，推荐使用beeline.然后提示你... 阅读全文

posted @ 2015-07-23 07:25 tneduts 阅读(2924) 评论(0) 推荐(0)

Hive query issue

摘要：One time, I have written a query with two tables join,One table is big table with partitions , another table is filter this big table.Then join the tw... 阅读全文

posted @ 2015-07-21 09:13 tneduts 阅读(254) 评论(2) 推荐(0)

Hive conf issue

摘要：Hive --hiveconf v1="test" --hiveconf v2 -e "select * from ${hiveconf:v1} where col1='${hiveconf:v2}' ";When we run this in linux, shell will parse the... 阅读全文

posted @ 2015-07-14 09:12 tneduts 阅读(308) 评论(0) 推荐(0)

Hive get table rows count batch

摘要：项目中需要比对两种方法计算生成的数据情况，需要做两件事情，比对生成的中间表的行数是否相同，比对最后一张表的数据是否一致。在获取表的数据量是一条一条地使用select count(*) from table来获取等待结果比较烦人，所以就写了一个bash shell来做这件事。但一开始是这样的：for ... 阅读全文

posted @ 2015-04-21 07:18 tneduts 阅读(684) 评论(0) 推荐(0)

six month dormancy test

摘要：source data:accountleg year_month amount acc1A 2010-01 100 acc1A 2010-02 100 acc1A 2010-03 100 acc1A 2010-04 100 acc1A 2010-06 100 ... 阅读全文

posted @ 2015-04-06 08:21 tneduts 阅读(286) 评论(1) 推荐(0)

Hive Experiment 2（表动态分区和IDE）

摘要：1.使用oracle sql developer 4.0.3作为hive query的IDE。下载hive-jdbc driverhttp://www.cloudera.com/content/cloudera/en/downloads/connectors/hive/jdbc/hive-jdbc-... 阅读全文

posted @ 2015-03-31 07:03 tneduts 阅读(495) 评论(0) 推荐(0)

Hive history date mapping

摘要：Hive history table mappingcreate table fdl_family asselect * from (select 'acc1' as account,'family1' as family,'2010-01-01' as effect_date from nums ... 阅读全文

posted @ 2015-03-27 07:47 tneduts 阅读(337) 评论(1) 推荐(0)

Hive variable demo

摘要：create table ori_trans (account string, maker string, tdate string) partitioned by (country string);hive -f test1.hql --hiveconf country='china'hive ... 阅读全文

posted @ 2015-03-25 22:22 tneduts 阅读(362) 评论(0) 推荐(0)

Hive tuning tips

摘要：1. limit Hive has a configuration property to enable sampling of source data for use with LIMIT: hive.limit.optimize.enable, set this parameter to ... 阅读全文

posted @ 2015-03-25 07:09 tneduts 阅读(237) 评论(1) 推荐(0)

Hive UDF’S addMonths

摘要：our project use hive 0.10 , and in the hiveql , we need use addMonths function builtin in hive-0.11.so I write this udf and test.java code:package myu... 阅读全文

posted @ 2015-03-24 21:59 tneduts 阅读(762) 评论(0) 推荐(0)

我的空中楼阁

随笔分类 - Hive