随笔分类 -  Big Data

摘要:Partitioner: Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on 阅读全文
posted @ 2019-06-09 11:19 Junfei_Wang 阅读(217) 评论(0) 推荐(0)
摘要:In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data i 阅读全文
posted @ 2019-05-29 22:31 Junfei_Wang 阅读(163) 评论(0) 推荐(0)
摘要:According to Wikipedia MapReduce, there are two ways to illustrate MapReduce. One contains three steps: Map, Shuffle and Reduce; Another one with 5 st 阅读全文
posted @ 2019-05-23 23:02 Junfei_Wang 阅读(136) 评论(0) 推荐(0)
摘要:Azure对于学生账户有260刀的免费试用,火急火燎地创建Hadoop Cluster!本例子是使用Hadoop MapReduce来统计一本电子书中各个单词的出现个数. Let's get hands dirty! 首先,我们在Azure中创建了一个Cluster,并且使用putty Ssh访问了 阅读全文
posted @ 2019-03-24 10:23 Junfei_Wang 阅读(458) 评论(0) 推荐(0)
摘要:1. Replication: 因为每个HDFS被部署在是低成本的商业硬件上(low cost commodity hardware),所以为了有更佳的Fault Tolerance,HDFS将每个Block备份存储。默认的Replication Factor=3. Note: The NameNo 阅读全文
posted @ 2019-03-03 12:14 Junfei_Wang 阅读(388) 评论(0) 推荐(0)
摘要:1. What's HDFS? Hadoop Distributed File System is a block-structured file system where each file is divided into blocks of a pre-determined size. Thes 阅读全文
posted @ 2019-02-13 05:45 Junfei_Wang 阅读(219) 评论(0) 推荐(0)