随笔分类 - Hadoop

http://www.zhihujingxuan.com/17213.html hadoop以及大数据技术发展到现在，最重要的是怎么用好这些技术，如何选择合适的技术方案来解决需要的问题，有必要时，如何混搭多种技术方案，以及，对于某个技术方案，如何优化使得针对特定应用场景的效果最佳。这需要广大从业者不断摸索、积累。

Using HiveServer2 - Authentication

摘要：To configure Hive for use with HiveServer2, include the following configuration properties in the .../hive-site.xmlconfiguration file. hive.support.c... 阅读全文

posted @ 2015-06-17 22:32 Ready! 阅读(1631) 评论(0) 推荐(0)

有了Hadoop MapReduce, 为什么还要Spark?

摘要：a. 由于MapReduce的shuffle过程需写磁盘，比较影响性能；而Spark利用RDD技术，计算在内存中进行.b. MapReduce计算框架(API)比较局限, 而Spark则是具备灵活性的并行计算框架.c. 再说说Spark API方面- Scala: Scalable Language... 阅读全文

posted @ 2015-05-21 17:29 Ready! 阅读(6015) 评论(0) 推荐(0)

HIVE: Map Join Vs Common Join, and SMB

摘要：HIVEMap Join is nothing but the extended version of Hash Join of SQL Server - just extending Hash Join into Distributed System.SMB(Sort Merge Bucket)J... 阅读全文

posted @ 2015-05-20 15:55 Ready! 阅读(1998) 评论(0) 推荐(0)

HIVE: SerDe应用实例

摘要：数据文件内容id=123,name=stevenid=55,name=ray期望输出格式123 steven55 ray1. 创建表, 用正则表达式的形式指定格式create table test1(id int, name string) row format serde 'org... 阅读全文

posted @ 2015-05-14 21:10 Ready! 阅读(1467) 评论(0) 推荐(0)

HIVE: UDF应用实例

摘要：数据文件内容TEST DATA HEREGood to Go我们准备写一个函数，把所有字符变为小写.1.开发UDFpackage MyTestPackage;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.T... 阅读全文

posted @ 2015-05-14 00:02 Ready! 阅读(1084) 评论(0) 推荐(0)

HIVE: Transform应用实例

摘要：数据文件内容steven:100;steven:90;steven:99^567^22ray:90;ray:98^456^30Tom:81^222^33期望最终放到数据库的数据格式如下：steven 100 567 22steven 90 567 22st... 阅读全文

posted @ 2015-05-12 22:42 Ready! 阅读(2941) 评论(0) 推荐(0)

HIVE: 自定义TextInputFormat (旧版MapReduceAPI ok, 新版MapReduceAPI实现有BUG？)

摘要：我们的输入文件 hello0, 内容如下:xiaowang 28 shanghai@_@zhangsan 38 beijing@_@someone 100 unknown逻辑上有3条记录, 它们以@_@分隔. 我们将分别用旧版MapReduce API 和新版MapReduce API实现自定义Te... 阅读全文

posted @ 2015-05-09 22:11 Ready! 阅读(1432) 评论(0) 推荐(0)

MapReduce: map读取文件的过程

摘要：我们的输入文件 hello0, 内容如下:xiaowang 28 shanghai@_@zhangsan 38 beijing@_@someone 100 unknown逻辑上有3条记录, 它们以@_@分隔.我们看看数据是如何被map读取的...1. 默认配置 /* New API */ ... 阅读全文

posted @ 2015-05-09 15:43 Ready! 阅读(5129) 评论(0) 推荐(0)

MapReduce性能分析实验

摘要：最近应项目需要, 对MapReduce进行了一些实验测试, 记录如下.测试环境3台VM虚拟机, 都是Ubuntu系统, 1G内存, Hadoop 2.6.01台 NameNode (Master)3台 DataNode (Slave)其中Master和2台Slave (Slave2, Slave3)... 阅读全文

posted @ 2015-05-07 13:34 Ready! 阅读(1540) 评论(2) 推荐(0)

MapReduce: number of mappers/reducers

摘要：14 down vote It's the other way round. Number of mappers is decided based on the number of splits. In reality it is the job of InputFormat, which you ... 阅读全文

posted @ 2015-05-01 09:25 Ready! 阅读(1091) 评论(1) 推荐(0)

Just do it...

学而不思则罔，思而不学则殆...

随笔分类 - Hadoop

公告