摘要: example.groupByKey().mapValues(list) 阅读全文
posted @ 2017-07-12 16:28 bonelee 阅读(9317) 评论(0) 推荐(1)
摘要: distinct(numPartitions=None) Return a new RDD containing the distinct elements in this RDD. >>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect 阅读全文
posted @ 2017-07-12 14:07 bonelee 阅读(2855) 评论(0) 推荐(0)
摘要: lookup(key) Return the list of values in the RDD for key key. This operation is done efficiently if the RDD has a known partitioner by only searching 阅读全文
posted @ 2017-07-12 10:47 bonelee 阅读(3206) 评论(0) 推荐(0)
摘要: rdd = sc.parallelizeDoubles(testData); rdd = sc.parallelizeDoubles(testData); rdd = sc.parallelizeDoubles(testData); Now we’ll calculate the mean of o 阅读全文
posted @ 2017-07-12 10:15 bonelee 阅读(595) 评论(0) 推荐(0)
摘要: 上面是粗暴的做法 简单的做法: 阅读全文
posted @ 2017-07-12 09:50 bonelee 阅读(1286) 评论(0) 推荐(0)