随笔分类 -  spark

摘要:PLSA.py 1 # coding:utf8 2 from pyspark import SparkContext 3 from pyspark import RDD 4 import numpy as np 5 from numpy.random import RandomState ... 阅读全文
posted @ 2015-10-23 17:24 porco 阅读(1037) 评论(0) 推荐(0)
摘要:windows7 spark单机环境搭建 + follow this link "how to run apache spark on windows7" pycharm 访问本机 spark + 安装py4j + 配置pycharm 在PYTHON_HOME\lib\site packa... 阅读全文
posted @ 2015-10-20 17:35 porco 阅读(3172) 评论(0) 推荐(0)
摘要:课程主要实用内容:1.spark实验环境的搭建2.4个lab的内容3.常用函数4.变量共享1.spark实验环境的搭建(windows)a. 下载,安装visualbox 管理员身份运行;课程要求最新版4.3.28,如果c中遇到虚拟机打不开的,可以用4.2.12,不影响b. 下载,安装vagrant... 阅读全文
posted @ 2015-07-13 11:57 porco 阅读(439) 评论(0) 推荐(0)
摘要:该函数官方的api,说的不是很明白:aggregate(zeroValue, seqOp, combOp)Aggregate the elements of each partition, and then the results for all the partitions, using a given combine functions and a neutral “zero value.”T... 阅读全文
posted @ 2015-07-13 11:30 porco 阅读(839) 评论(0) 推荐(0)
摘要:import random as rdimport mathclass LogisticRegressionPySpark: def __init__(self,MaxItr=100,eps=0.01,c=0.1): self.max_itr = MaxItr se... 阅读全文
posted @ 2015-07-03 19:43 porco 阅读(833) 评论(0) 推荐(0)