Pentium.Labs

System全家桶:https://zhuanlan.zhihu.com/c_1238468913098731520

导航

MIT 6.824学习笔记4 Lab1

现在我们准备做第一个作业Lab1啦

wjk大神也在做6.824,可以参考大神的笔记https://github.com/zzzyyyxxxmmm/MIT6824_Distribute_System


Part I

The Map/Reduce implementation you are given is missing some pieces. Before you can write your first Map/Reduce function pair, you will need to fix the sequential
implementation. In particular, the code we give you is missing two crucial pieces: the function that divides up the output of a map task, and the function that gathers all the inputs for a reduce task. These tasks are carried out by the doMap() function in common_map.go, and the doReduce() function in common_reduce.go respectively. The comments in those files should point you in the right direction.

 

doMap()要求对一个文件进行map操作,并输出nReduce个intermediate files。map task会运行很多(但part1是sequential的,暂时不需要考虑加锁之类的问题),每个都有自己的jobName和mapTask作为id号。为了简化很多细节,doMap()中提供了以下函数可用:

  mapF():读取指定输入文件,并返回数据内容(是一堆key-value)。mapF() is the map function provided by the application. The first argument should be the input file name, though the map function typically ignores it. The second argument should be the entire input file contents. mapF() returns a slice containing the key/value pairs for reduce; see common.go for the definition of KeyValue.

  reduceName():按规则生成intermediate files的文件名。编号为r的文件负责存储ihash(key) mod nReduce==r的键值对。There is one intermediate file per reduce task. The file name includes both the map task number and the reduce task number. Use the filename generated by reduceName(jobName, mapTask, r) as the intermediate file for reduce task r. Call ihash() (see below) on each key, mod nReduce, to pick r for a key/value pair.

那么doMap()的工作就是读取文件,然后枚举每个kv键值对,送给对应的intermediate file就好啦。

 

doReduce()就正好反过来,要求读取nMap个intermediate file中的kv键值对,调用用户指定的reduce function(这里是把对key相同的values给append到一起),并写入outFile。reduce task也会运行很多(但part1是sequential的,暂时不需要考虑加锁之类的问题),每个都有自己的jobName和reduceTask作为id号。doReduce()同样提供了很多函数可用:

  reduceName():reduceName(jobName, m, reduceTask) yields the file name from map task m.

  reduceF():reduceF() is the application's reduce function. You should call it once per distinct key, with a slice of all the values for that key. reduceF() returns the reduced value for that key.

doReduce()的工作就是对map出来的intermediate files,用一个大map来合并所有的kv键值对,然后写到输出文件。输出时按key的顺序排好序输出。

 

代码:https://github.com/pentium3/mit6824/tree/master/src/mapreduce

Part I的代码位于 src/mapreduce/common_map.go 和 src/mapreduce/common_reduce.go


PartII

这一步的任务是做一个Wordcount。 Now you will implement word count — a simple Map/Reduce example. Look in main/wc.go; you'll find empty mapF() and reduceF() functions. Your job is to insert code so that wc.go reports the number of occurrences of each word in its input. A word is any contiguous sequence of letters, as determined by unicode.IsLetter.

Review Section 2 of the MapReduce paper. Your mapF() and reduceF() functions will differ a bit from those in the paper's Section 2.1. Your mapF() will be passed the name of a file, as well as that file's contents; it should split the contents into words, and return a Go slice of mapreduce.KeyValue. While you can choose what to put in the keys and values for the mapF output, for word count it only makes sense to use words as the keys. Your reduceF() will be called once for each key, with a slice of all the values generated by mapF() for that key. It must return a string containing the total number of occurences of the key.

mapF():The map function is called once for each file of input. The first argument is the name of the input file, and the second is the file's complete contents. You should ignore the input file name, and look only at the contents argument. The return value is a slice of key/value pairs. 作用就是把content分割成words,并返回Wordcount的key-value(在当前content里,每个word的出现次数)

reduceF():The reduce function is called once for each key generated by the map tasks, with a list of all the values created for that key by any map task.

(其实和5105的Lab1基本上一样......)

 

 代码:https://github.com/pentium3/mit6824/tree/master/src/main

 PartII的代码位于  src/main/wc.go


Part III

之前做的还都是单机串行的,这次要来个并行的啦

 

 

 


几个go语言知识总结

代码中大量用到了make生成切片。make的用法参考 https://www.cnblogs.com/pdev/p/10928735.html

 

 


 

Ref:

https://zhuanlan.zhihu.com/p/36158168

https://www.cnblogs.com/a1225234/p/10886410.html

http://nil.csail.mit.edu/6.824/2018/labs/lab-1.html

posted on 2019-08-01 12:50  Pentium.Labs  阅读(788)  评论(0编辑  收藏  举报



Pentium.Lab Since 1998