随笔分类 - Big Data
摘要:Partitioner: Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on
阅读全文
摘要:In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data i
阅读全文
摘要:According to Wikipedia MapReduce, there are two ways to illustrate MapReduce. One contains three steps: Map, Shuffle and Reduce; Another one with 5 st
阅读全文
摘要:Azure对于学生账户有260刀的免费试用,火急火燎地创建Hadoop Cluster!本例子是使用Hadoop MapReduce来统计一本电子书中各个单词的出现个数. Let's get hands dirty! 首先,我们在Azure中创建了一个Cluster,并且使用putty Ssh访问了
阅读全文
摘要:1. Replication: 因为每个HDFS被部署在是低成本的商业硬件上(low cost commodity hardware),所以为了有更佳的Fault Tolerance,HDFS将每个Block备份存储。默认的Replication Factor=3. Note: The NameNo
阅读全文
摘要:1. What's HDFS? Hadoop Distributed File System is a block-structured file system where each file is divided into blocks of a pre-determined size. Thes
阅读全文
浙公网安备 33010602011771号