2018 年 2月随笔档案 - oftenlin

Spark 学习笔记 —— 常见API

摘要：一、RDD 的创建 1）通过 RDD 的集合数据结构，创建 RDD sc.parallelize(List(1,2,3),2) 其中第二个参数代表的是整个数据，分为 2 个 partition，默认情况会讲数据集进行平分，注意不是两个副本 2）通过文件来读取 sc.textFile("file.tx 阅读全文

posted @ 2018-02-27 13:20 oftenlin 阅读(288) 评论(0) 推荐(0)

python 列表常用操作(二)

摘要：1、tuple 的 unpack a,b = t 2、格式化输出 print('您的输入:{},值为{}',format(a,b)) 3、日期计算 import datetime as dt import time as tm print ("time={}",tm.time()) 4、pyhon 阅读全文

posted @ 2018-02-26 10:10 oftenlin 阅读(158) 评论(0) 推荐(0)

Spark 常见问题集合

摘要：一、Spark 为什么比 MapReduce 要高效？举一个例子： select a.state,count(*),AVERAGE(c.price) from a join b on (a.id=b.id) join c on (a.itemId=c.itermId) group by a.sta 阅读全文

posted @ 2018-02-24 17:36 oftenlin 阅读(167) 评论(0) 推荐(0)

oftenlin

思考，前进...

02 2018 档案

公告