
MapReduce模式MapReduce patterns

After having modified and run a job in the last post, we can now examine which are the most frequent patterns we encounter in MapReduce programming. 
Although there are many of them, I think that the most important ones are:

  • Summarization
  • Filtering
  • Structural

Let's examine them in detail. 

By summarization we mean all the jobs that perform numerical computation over a set of data, like:

  • indexing
  • mean (or other statistical functions) computation
  • min/max computation
  • count (we've seen the WordCount example)

Filtering is the act of retrieving only a subset of a bigger dataset. Most used cases are retrieving all data belonging to a single user or the top-N elements (by some criteria) of the dataset. Another frequent use of filtering is for sampling a dataset: when we're dealing with a lot of data , is usually a good idea to subset the original data by choosing some elements randomly to verify the behaviour of our job. 

When you need to operate on the structure of the data; most used case is a join on different data, like the ones we're used to on a RDBMS. 

In the next posts, we'll see in more detail how to deal with these patterns.

