Hadoop - 随笔分类 - yuerspring

日期 Long 类型和date 类型转化

摘要：select create_time ,FROM_UNIXTIME(CREATE_TIME/1000,'%Y-%m-%d %H:%i:%s') from xxxx where create_time = 1551691014日期 Long 类型和date 类型转化... 阅读全文

posted @ 2019-03-04 17:20 yuerspring 阅读(1260) 评论(0) 推荐(0)

flink stream 终于上local 集群运行起来

摘要：先上图运行界面运行了三个任务第一个是word count第二三个是数据 producer and consumer ----> 更多代码可以参考上一篇blog 里面有很详细的介绍在数据从idea 导出jar 包已经运行的过程中遇到两个问题 1，导出时候 ... 阅读全文

posted @ 2019-01-23 20:19 yuerspring 阅读(265) 评论(0) 推荐(0)

zk kafka mariadb scala flink integration

摘要：zk kafka mariadb scala flink integrationI do not want to write this paper in the beginning , put the codes onto github.com/git.jd.com,... 阅读全文

posted @ 2019-01-17 08:50 yuerspring 阅读(271) 评论(0) 推荐(0)

Sqoop 调试错误 from hive to mysql

摘要：1. mysql jdbc 错误需要驱动 2. mysql 服务器 ip 错误即便是本地也需要用域名或是 ip 不能用 localhost 等 3. 数据长度问题，hive 乳沟大于 mysql 错误4. hive 跟 mys... 阅读全文

posted @ 2018-12-21 20:16 yuerspring 阅读(263) 评论(0) 推荐(0)

hive 小文件数据合并

摘要：hive 数据有时候需要进行数据合并 #!/bin/bashhadoop jar /software/servers/bdp_tools/mergefiles-1.7.jar -u lzo -p hdfs://ns1/user/dd_edw/adm.db/table_... 阅读全文

posted @ 2018-12-06 14:07 yuerspring 阅读(1158) 评论(0) 推荐(0)

The most important parameters of spark env when you using spark run data things

摘要：The most important parameters of spark env when you using spark run data thingsIn my memory I always confused by these parameters ,s... 阅读全文

posted @ 2018-11-02 16:40 yuerspring 阅读(177) 评论(0) 推荐(0)

Hive 行列转换

摘要：在京东众多业务中，促销业务充满了复杂性和挑战性，因为业务的灵活性，很多数据都存储成xml和json格式数据，这就要求下游数据分析师们需要对其做解析后方可使用。在众多操作中，有一种是需要对数据做行列转换操作。数据结构：create external table jd... 阅读全文

posted @ 2018-09-26 20:16 yuerspring 阅读(261) 评论(0) 推荐(0)

Hive 解析 json,json array

摘要：在大数据处理中经常遇到业务端发送json 数据到 table 里的情况，这个需要数据开发工程师能够准备对json string 进行解析，并重新定义新表的结构。在网络上看到很多网友提到 get_json_object ,json_tuple 的使用和案例 ... 阅读全文

posted @ 2018-07-18 14:13 yuerspring 阅读(1241) 评论(0) 推荐(0)

Hive SQL 将一个行转化成N多列

摘要：select explode(Array('row1','row2','...','rown'))Result col_name row1row2...rown 阅读全文

posted @ 2018-06-16 16:54 yuerspring 阅读(1387) 评论(0) 推荐(0)

Hive udtf 报错 java.lang.String cannot be cast to java.lang.Integer

摘要：Error ：Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java... 阅读全文

posted @ 2018-03-08 09:14 yuerspring 阅读(832) 评论(0) 推荐(0)

Hadoop 下常用的命令

摘要：今天在bluemix 上简易的搭建了一个hadoop cluster ，憨的是 hadoop 命令忘的查不到了，今天补充温故知新FS Shell 调用文件系统(FS)Shell命令应使用 bin/hadoop fs 的形式。所有的的FS shell命令使用URI... 阅读全文

posted @ 2018-02-08 12:41 yuerspring 阅读(164) 评论(0) 推荐(0)

Hive 以及mysql 中如何做except 数据操作

摘要：在db2 和 oracle 中，当我们想知道两张结构相同的table 中，数据差异时候可以采用如下sql ：select * from table1 except select * from table2上面语句求出来的就是数据在table1 而不再table2... 阅读全文

posted @ 2017-11-07 13:17 yuerspring 阅读(1316) 评论(0) 推荐(0)

How to establish a big data platform ?

摘要：How to establish a big data platform ?http://xyz.insightdataengineering.com/blog/pipeline_map/https://blog.insightdatascience.com/the-... 阅读全文

posted @ 2017-08-16 17:36 yuerspring 阅读(172) 评论(0) 推荐(0)

what we need to learn durning the period of bigdata(数据科学家）

摘要：1.Language SQL ,Python ,Shell ,Scala, Java 2.Tool or Framework(Real-time computing & offline computing)Hive,Spark(sql & streaming) ,Ha... 阅读全文

posted @ 2017-08-04 15:11 yuerspring 阅读(177) 评论(0) 推荐(0)

hadoop fs 运维常用的几个命令

摘要：FS Shell调用文件系统(FS)Shell命令应使用 bin/hadoop fs 的形式。所有的的FS shell命令使用URI路径作为参数。URI格式是scheme://authority/path。对HDFS文件系统，scheme是hdfs，对本地文件系统... 阅读全文

posted @ 2017-05-30 17:58 yuerspring 阅读(237) 评论(0) 推荐(0)

大数据知识体系

摘要：整个大数据处理的体系，按我的理解可以分为两个部分，一个是分布式存储系统、另一个是分布式计算框架。分布式存储系统主流是HadoopDFS，其他还有Ceph和Swift。分布式计算框架主流是MapReduce，Storm和Spark。 ... 阅读全文

posted @ 2016-11-13 14:36 yuerspring 阅读(315) 评论(0) 推荐(0)

大数据面试题

摘要：1.列举spark 比hadoop 快的原因，以及现在存在的主要问题2.描述下使用spark streaming 和 GraphX实现实时计算的可行性，以及可能会遇到的问题3.GraphX的Pregel API 只支持有向图遍历，如何实现无xiang 遍历，描... 阅读全文

posted @ 2016-10-15 13:37 yuerspring 阅读(136) 评论(0) 推荐(0)

大数据hadoop 面试经典题

摘要：1.在Hadoop中定义的主要公用InputFormat中，默认是哪一个？（A） A、 TextInputFormatB、 KeyValueInputFormatC、 SequenceFileInputFormat 1. 下面哪个程序负责 HDFS 数据存储？（C）... 阅读全文

posted @ 2016-10-14 22:19 yuerspring 阅读(291) 评论(0) 推荐(0)

Hadoop 源码编译 step by step 最简洁的步骤

摘要：各软件版本：Java : 1.7.0_79Hadoop ： hadoop-2.6.5-src.tar.gzmaven：3.3.9protocbuf：2.5解压缩 tar -zxvf 1 配置maven 环境变量 export MAVEN_HOME=/root/com... 阅读全文

posted @ 2016-10-12 22:18 yuerspring 阅读(254) 评论(0) 推荐(0)

crontab+shell 作业流程调度

摘要：在前面的文章中，我们有简单提到过 shell or python 调度hive 的伪代码，今天我们在丰富下这些伪代码 http://blog.csdn.net/haohaixingyun/article/details/51821444说明，在实际生产环境中，可... 阅读全文

posted @ 2016-07-19 23:16 yuerspring 阅读(299) 评论(0) 推荐(0)

yuerspring

随笔分类 - Hadoop

公告