摘要:
4.1 Introdution Consider a company such as Amazon, which has over 200 millions of users and possibly can do hundreds of millions of transactions per d 阅读全文
摘要:
3.1 Introduction Given a set of (key as string, value as integer) pairs, then finding a Top N ( where N 0) list is a "design pattern" (a "design patte 阅读全文
摘要:
2.1 Introduction MapReduce framework sorts input to reducers by key, but values of reducers are arbitrarily ordered. This means that if all mappers ge 阅读全文
摘要:
开始学习《数据算法:Hadoop/Spark大数据处理技巧》第1 5章,假期有空就摘抄下来,毕竟不是纸质的可以写写画画,感觉这样效果好点,当然复杂的东西仍然跳过。写博客越发成了做笔记的感觉。 以上。 1.1 What is a Secondary Sort Problem? MapReduce fr 阅读全文
摘要:
Resilient Distributed Datasets Resilient Distributed Datasets ( RDD ) is a fundamental data structure of Spark. It is an immutable distributed collect 阅读全文
摘要:
看了一点《数据算法:Hadoop/Spark大数据处理技巧》,觉得有必要了解一下 Spark 。 以上。 Spark was introduced by Apache Software Foundation for speeding up the Hadoop computational compu 阅读全文