SparkR-Install
SparkR-Install
标签:too 下载 安装jdk context writing 磁盘 anti 1.5 products
1.下载R
https://cran.r-project.org/src/base/R-3/

1.2 环境变量配置:

1.3 测试安装:

2.下载Rtools33
https://cran.r-project.org/bin/windows/Rtools/

2.1 配置环境变量

2.2 测试:

3.安装RStudio
https://www.rstudio.com/products/rstudio/download/ 直接下一步即可安装

4.安装JDK并设置环境变量
4.1环境变量配置:



4.2测试:


5.下载Spark安装程序
5.1 URL: http://spark.apache.org/downloads.html

5.2解压到本地磁盘的对应目录

6.安装Spark并设置环境变量


7.测试SparkR


注意:如果发现了提示 WARN NativeCodeLader:Unable to load native-hadoop library for your platform.....using
builtin-java classes where applicable 需要安装本地的hadoop库
8.下载hadoop库并安装
http://hadoop.apache.org/releases.html


9.设置hadoop环境变量


10.重新测试SparkR
10.1 如果测试时候出现以下提示,需要修改log4j文件INFO为WARN,位于\spark\conf下

10.2 修改conf中的log4j文件:


10.3 重新运行SparkR

11.运行SprkR代码
在Spark2.0中增加了RSparkSql进行Sql查询
dataframe为数据框操作
data-manipulation为数据转化
ml为机器学习

11.1 使用crtl+ALT+鼠標左鍵 打开控制台在此文件夹下

11.2 执行spark-submit xxx.R文件即可

12.安装SparkR包
12.1 将spark安装目录下的R/lib中的SparkR文件拷贝到..\R-3.3.2\library中,注意是将整个Spark文件夹,而非里面每一个文件。
源文件夹:
目的文件夹:

12.2 在RStudio中打开SparkR文件并运行代码dataframe.R文件,采用Ctrl+Enter一行行执行即可
SparkR语言的dataframe.R源代码如下
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
library(SparkR)
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-DataFrame-example")
sqlContext <- sparkRSQL.init(sc)
# Create a simple local data.frame
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
# Convert local data frame to a SparkR DataFrame
df <- createDataFrame(sqlContext, localDF)
# Print its schema
printSchema(df)
# root
# |-- name: string (nullable = true)
# |-- age: double (nullable = true)
# Create a DataFrame from a JSON file
path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
peopleDF <- read.json(sqlContext, path)
printSchema(peopleDF)
# Register this DataFrame as a table.
registerTempTable(peopleDF, "people")
# SQL statements can be run by using the sql methods provided by sqlContext
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
# Call collect to get a local data.frame
teenagersLocalDF <- collect(teenagers)
# Print the teenagers in our dataset
print(teenagersLocalDF)
# Stop the SparkContext now
sparkR.stop()
13.Rsudio 运行结果

END~

浙公网安备 33010602011771号