05.reduceByKey

reduceByKey -- list
I'm kind of late to the conversation, but here's my suggestion:

>>> foo = sc.parallelize([(1, ('a','b')), (2, ('c','d')), (1, ('x','y'))])
>>> foo.map(lambda (x,y): (x, [y])).reduceByKey(lambda p,q: p+q).collect()
[(1, [('a', 'b'), ('x', 'y')]), (2, [('c', 'd')])]

  

posted @ 2018-06-22 22:00  桃源仙居  阅读(71)  评论(0)    收藏  举报