编写简单spark程序
一.单机spark程序
1.sbt安装和使用,详见参考文献1。sbt-launch.jar在下载的spark中的sbt文件夹里有。
2.运行参考文献2中"A Standalone App in Scala"程序。
出现问题:
sbt package时出错,提示
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: FAILED DOWNLOADS ::
[warn] :: ^ see resolution messages for details ^ ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.eclipse.jetty.orbit#javax.servlet;2.5.0.v201103041518!javax.servlet.orbit
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
解决方法:见参考文献3
注:因为程序要下载依赖的库文件,所以可能花的时间比较长。
二、在集群上跑spark程序
1.启动spark集群[hadoop@Master spark-0.8.1]$ ./bin/start-all.sh
2.编写程序,然后sbt package,最后sbt run运行。
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object SimpleApp {
def main(args: Array[String]) {
val sc = new SparkContext("spark://192.168.178.92:7077", "HdfsTest", "/home/hadoop/spark-0.8.1", List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
val file = sc.textFile("hdfs://192.168.178.92:9000/user/hadoop/input/txt")
val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
count.saveAsTextFile("hdfs://192.168.178.92:9000/user/hadoop/output/txt/sparkWordCount")
println("hello !!!")
System.exit(0)
}
}
注:sc的初始化要注意。
build.sbt如下:
name := "Simple Project" version := "1.0" scalaVersion := "2.9.3" libraryDependencies += "org.apache.spark" %% "spark-core" % "0.8.1-incubating" resolvers += "Akka Repository" at "http://repo.akka.io/releases/" libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "1.1.2" ivyXML := <dependency org="org.eclipse.jetty.orbit" name="javax.servlet" rev="3.0.0.v201112011016"> <artifact name="javax.servlet" type="orbit" ext="jar"/> </dependency>
3.scala编程学习参见参考文献4
参考文献
[1]sbt安装及使用:http://www.scala-sbt.org/release/docs/Getting-Started/Setup.html
[2]spark quick start:http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-job-in-scala
[3]SBT, Jetty and Servlet 3.0:http://stackoverflow.com/questions/9889674/sbt-jetty-and-servlet-3-0
[4]scala教程http://scalachina.com/node/16
浙公网安备 33010602011771号