Spark scala 程序开发

开发Spark 程序,看到的都是要打包成jar 然后运行。 今天发现其实对于standalone也可以直接运行。

如下代码,设定好master , 然后选择run as -> Scala Application 即可。

通过这种方式可以节约打包时间。

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object WordCount {

  def main(args: Array[String]): Unit = {

    val logFile = "README.md" // Should be some file on your system
    val conf = new SparkConf()
                .setAppName("Simple Application")
                .setMaster("local[2]")
    val sc = new SparkContext(conf)
        
    val file = sc.textFile(logFile, 2).cache()
    val counts = file.flatMap(line => line.split(" "))
                        .map(word => (word, 1))
                        .reduceByKey(_+_)
                        
    counts.saveAsTextFile("result")
  }

}

 

控制台输出结果如下。

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/11/06 00:40:28 INFO SecurityManager: Changing view acls to: hduser,
14/11/06 00:40:28 INFO SecurityManager: Changing modify acls to: hduser,
14/11/06 00:40:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hduser, ); users with modify permissions: Set(hduser, )
14/11/06 00:40:30 INFO Slf4jLogger: Slf4jLogger started
14/11/06 00:40:30 INFO Remoting: Starting remoting
14/11/06 00:40:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@namenode1:36164]
14/11/06 00:40:31 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@namenode1:36164]
14/11/06 00:40:31 INFO Utils: Successfully started service 'sparkDriver' on port 36164.
14/11/06 00:40:31 INFO SparkEnv: Registering MapOutputTracker
14/11/06 00:40:31 INFO SparkEnv: Registering BlockManagerMaster
14/11/06 00:40:31 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141106004031-6c1f
14/11/06 00:40:31 INFO Utils: Successfully started service 'Connection manager for block manager' on port 43311.
14/11/06 00:40:31 INFO ConnectionManager: Bound socket to port 43311 with id = ConnectionManagerId(namenode1,43311)
14/11/06 00:40:31 INFO MemoryStore: MemoryStore started with capacity 515.8 MB
14/11/06 00:40:31 INFO BlockManagerMaster: Trying to register BlockManager
14/11/06 00:40:31 INFO BlockManagerMasterActor: Registering block manager namenode1:43311 with 515.8 MB RAM
14/11/06 00:40:31 INFO BlockManagerMaster: Registered BlockManager
14/11/06 00:40:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-885d61b7-801f-4701-9676-abfe34983844
14/11/06 00:40:32 INFO HttpServer: Starting HTTP Server
14/11/06 00:40:32 INFO Utils: Successfully started service 'HTTP file server' on port 50899.
14/11/06 00:40:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
14/11/06 00:40:34 INFO SparkUI: Started SparkUI at http://namenode1:4040
14/11/06 00:40:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/06 00:40:36 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@namenode1:36164/user/HeartbeatReceiver
14/11/06 00:40:38 INFO MemoryStore: ensureFreeSpace(159118) called with curMem=0, maxMem=540821422
14/11/06 00:40:38 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 155.4 KB, free 515.6 MB)
14/11/06 00:40:38 INFO FileInputFormat: Total input paths to process : 1
14/11/06 00:40:39 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/11/06 00:40:39 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/11/06 00:40:39 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/11/06 00:40:39 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/11/06 00:40:39 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/11/06 00:40:39 INFO SparkContext: Starting job: saveAsTextFile at WordCount.scala:21
14/11/06 00:40:39 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:18)
14/11/06 00:40:39 INFO DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:21) with 2 output partitions (allowLocal=false)
14/11/06 00:40:39 INFO DAGScheduler: Final stage: Stage 0(saveAsTextFile at WordCount.scala:21)
14/11/06 00:40:39 INFO DAGScheduler: Parents of final stage: List(Stage 1)
14/11/06 00:40:39 INFO DAGScheduler: Missing parents: List(Stage 1)
14/11/06 00:40:40 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:18), which has no missing parents
14/11/06 00:40:40 INFO MemoryStore: ensureFreeSpace(3360) called with curMem=159118, maxMem=540821422
14/11/06 00:40:40 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 515.6 MB)
14/11/06 00:40:40 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:18)
14/11/06 00:40:40 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/11/06 00:40:40 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1192 bytes)
14/11/06 00:40:40 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1192 bytes)
14/11/06 00:40:40 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
14/11/06 00:40:40 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
14/11/06 00:40:41 INFO CacheManager: Partition rdd_1_1 not found, computing it
14/11/06 00:40:41 INFO CacheManager: Partition rdd_1_0 not found, computing it
14/11/06 00:40:41 INFO HadoopRDD: Input split: file:/home/hduser/workspace/TestScala/README.md:0+2405
14/11/06 00:40:41 INFO HadoopRDD: Input split: file:/home/hduser/workspace/TestScala/README.md:2405+2406
14/11/06 00:40:41 INFO MemoryStore: ensureFreeSpace(7512) called with curMem=162478, maxMem=540821422
14/11/06 00:40:41 INFO MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 7.3 KB, free 515.6 MB)
14/11/06 00:40:41 INFO MemoryStore: ensureFreeSpace(8352) called with curMem=169990, maxMem=540821422
14/11/06 00:40:41 INFO MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 8.2 KB, free 515.6 MB)
14/11/06 00:40:41 INFO BlockManagerInfo: Added rdd_1_1 in memory on namenode1:43311 (size: 7.3 KB, free: 515.8 MB)
14/11/06 00:40:41 INFO BlockManagerMaster: Updated info of block rdd_1_1
14/11/06 00:40:41 INFO BlockManagerInfo: Added rdd_1_0 in memory on namenode1:43311 (size: 8.2 KB, free: 515.8 MB)
14/11/06 00:40:41 INFO BlockManagerMaster: Updated info of block rdd_1_0
14/11/06 00:40:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 2433 bytes result sent to driver
14/11/06 00:40:41 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 2433 bytes result sent to driver
14/11/06 00:40:41 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 1112 ms on localhost (1/2)
14/11/06 00:40:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1175 ms on localhost (2/2)
14/11/06 00:40:41 INFO DAGScheduler: Stage 1 (map at WordCount.scala:18) finished in 1.204 s
14/11/06 00:40:41 INFO DAGScheduler: looking for newly runnable stages
14/11/06 00:40:41 INFO DAGScheduler: running: Set()
14/11/06 00:40:41 INFO DAGScheduler: waiting: Set(Stage 0)
14/11/06 00:40:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
14/11/06 00:40:41 INFO DAGScheduler: failed: Set()
14/11/06 00:40:41 INFO DAGScheduler: Missing parents for Stage 0: List()
14/11/06 00:40:41 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:21), which is now runnable
14/11/06 00:40:42 INFO MemoryStore: ensureFreeSpace(57496) called with curMem=178342, maxMem=540821422
14/11/06 00:40:42 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.1 KB, free 515.5 MB)
14/11/06 00:40:42 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:21)
14/11/06 00:40:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/06 00:40:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 948 bytes)
14/11/06 00:40:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 948 bytes)
14/11/06 00:40:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 2)
14/11/06 00:40:42 INFO Executor: Running task 1.0 in stage 0.0 (TID 3)
14/11/06 00:40:42 INFO BlockManager: Removing broadcast 1
14/11/06 00:40:42 INFO BlockManager: Removing block broadcast_1
14/11/06 00:40:42 INFO MemoryStore: Block broadcast_1 of size 3360 dropped from memory (free 540588944)
14/11/06 00:40:42 INFO ContextCleaner: Cleaned broadcast 1
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 15 ms
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/11/06 00:40:42 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 17 ms
14/11/06 00:40:42 INFO FileOutputCommitter: Saved output of task 'attempt_201411060040_0000_m_000001_3' to file:/home/hduser/workspace/TestScala/result/_temporary/0/task_201411060040_0000_m_000001
14/11/06 00:40:42 INFO FileOutputCommitter: Saved output of task 'attempt_201411060040_0000_m_000000_2' to file:/home/hduser/workspace/TestScala/result/_temporary/0/task_201411060040_0000_m_000000
14/11/06 00:40:42 INFO SparkHadoopWriter: attempt_201411060040_0000_m_000000_2: Committed
14/11/06 00:40:42 INFO SparkHadoopWriter: attempt_201411060040_0000_m_000001_3: Committed
14/11/06 00:40:42 INFO Executor: Finished task 1.0 in stage 0.0 (TID 3). 826 bytes result sent to driver
14/11/06 00:40:42 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 826 bytes result sent to driver
14/11/06 00:40:42 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 395 ms on localhost (1/2)
14/11/06 00:40:42 INFO DAGScheduler: Stage 0 (saveAsTextFile at WordCount.scala:21) finished in 0.387 s
14/11/06 00:40:42 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 395 ms on localhost (2/2)
14/11/06 00:40:42 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/11/06 00:40:42 INFO SparkContext: Job finished: saveAsTextFile at WordCount.scala:21, took 2.649520816 s

  

 

posted @ 2014-11-06 17:04  自由行走  阅读(354)  评论(0编辑  收藏  举报