Riordon

  博客园 :: 首页 :: 新随笔 :: :: :: 管理 ::

Spark IDEA开发环境构建

本文档基于IEDA构建spark maven应用。
date: 2016/8/1
author: wangxl

1.下载IDEA

https://www.jetbrains.com/idea/

2.安装Scala插件

Plugins-->Scala-->Install Plugin

3.生成骨架

3.1 maven生成骨架

mvn archetype:generate -DarchetypeGroupId=net.alchim31.maven -DarchetypeArtifactId=scala-archetype-simple -DarchetypeVersion=1.5 -DgroupId=com.glsx -DartifactId=spark-demo -Dversion=1.0 -Dpackage=com.glsx
注意:
(1) 该骨架生成依赖maven官方源,http://scala-tools.org/repo-releases此源已经失效,不要使用IDEA默认界面生成
(2) 使用-DarchetypeGroupId=net.alchim31.maven,而不是默认的org.scala-tools.archetypes
(3) 2.10.x使用1.5,2.11.x使用1.6

3.2 修改pom文件,添加Spark依赖


4.0.0
com.glsx
spark-demo
1.0
${project.artifactId}
My wonderfull scala app
2010


My License
http://....
repo


<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
UTF-8
<scala.tools.version>2.10</scala.tools.version>
<scala.version>2.10.5</scala.version>
<spark.version>1.6.2</spark.version>
<hadoop.version>2.3.0-cdh5.0.2</hadoop.version>




cloudera-repo
Cloudera Repository
https://repository.cloudera.com/artifactory/cloudera-repos

true


false




org.scala-lang
scala-library
${scala.version}



junit
junit
4.11
test


org.specs2
specs2_${scala.tools.version}
1.13
test


org.scalatest
scalatest_${scala.tools.version}
2.0.M6-SNAP8
test



org.apache.spark
spark-core_2.10
${spark.version}


org.apache.spark
spark-sql_2.10
${spark.version}


org.apache.spark
spark-hive_2.10
${spark.version}


org.apache.spark
spark-streaming_2.10
${spark.version}


org.apache.spark
spark-mllib_2.10
${spark.version}


org.apache.hadoop
hadoop-client
${hadoop.version}


org.apache.spark
spark-streaming-kafka_2.10
${spark.version}


mysql
mysql-connector-java
5.1.6


src/main/scala
src/test/scala



net.alchim31.maven
scala-maven-plugin
3.1.3



compile
testCompile



-make:transitive
-dependencyfile
${project.build.directory}/.scala_dependencies






org.apache.maven.plugins
maven-surefire-plugin
2.13

false
true



/Test.
/Suite.





3.3 执行打包命令

mvn clean package -DskipTests
这个过程需要很久很久,慢慢地等待,成功如下:

3.4 导入IDEA

4.编写用例

import scala.math.random
import org.apache.spark._

object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (xx + yy < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}

5.打包提交任务

用maven打包,将tar上传至服务器
bin/spark-submit --master yarn --class com.glsx.main.SparkPi spark-demo-1.0.jar

posted on 2016-08-01 12:32  Riordon  阅读(606)  评论(0)    收藏  举报