Big Data - 随笔分类 - Junfei_Wang

MapReduce(3): Partitioner, Combiner and Shuffling

摘要：Partitioner: Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on 阅读全文

posted @ 2019-06-09 11:19 Junfei_Wang 阅读(217) 评论(0) 推荐(0)

MapReduce(2): How does Mapper work

摘要：In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data i 阅读全文

posted @ 2019-05-29 22:31 Junfei_Wang 阅读(163) 评论(0) 推荐(0)

MapReduce(1): Prepare input for Mappers

摘要：According to Wikipedia MapReduce, there are two ways to illustrate MapReduce. One contains three steps: Map, Shuffle and Reduce; Another one with 5 st 阅读全文

posted @ 2019-05-23 23:02 Junfei_Wang 阅读(136) 评论(0) 推荐(0)

Hadoop: 在Azure Cluster上使用MapReduce

摘要：Azure对于学生账户有260刀的免费试用，火急火燎地创建Hadoop Cluster！本例子是使用Hadoop MapReduce来统计一本电子书中各个单词的出现个数. Let's get hands dirty! 首先，我们在Azure中创建了一个Cluster，并且使用putty Ssh访问了阅读全文

posted @ 2019-03-24 10:23 Junfei_Wang 阅读(458) 评论(0) 推荐(0)

Hadoop(2): Blocks存储管理及读写

摘要：1. Replication: 因为每个HDFS被部署在是低成本的商业硬件上(low cost commodity hardware)，所以为了有更佳的Fault Tolerance，HDFS将每个Block备份存储。默认的Replication Factor=3. Note: The NameNo 阅读全文

posted @ 2019-03-03 12:14 Junfei_Wang 阅读(388) 评论(0) 推荐(0)

Hadoop(1): HDFS基础架构

摘要：1. What's HDFS? Hadoop Distributed File System is a block-structured file system where each file is divided into blocks of a pre-determined size. Thes 阅读全文

posted @ 2019-02-13 05:45 Junfei_Wang 阅读(219) 评论(0) 推荐(0)

Rhys_Wang

随笔分类 - Big Data

公告