Spark

先安装scala + hadoop 这里省略

下载spark安装包
http://apache.communilink.net/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

hadoop@muhe221:~/soft$ tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz
hadoop@muhe221:~/soft$ mv spark-2.4.0-bin-hadoop2.7  spark-2.4.0-hadoop2.7
hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/conf$ cp spark-env.sh.template spark-env.sh
hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/conf$ cp slaves.template slaves

修改spark-env.sh

export SCALA_HOME=~/soft/scala-2.12.8
export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121   #注意修改
export SPARK_MASTER_IP=muhe221
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=~/soft/hadoop-2.7.7/etc/hadoop

修改~/.bashrc

export SPARK_HOME=~/soft/spark-2.4.0-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

修改slaves

muhe221
caoming
muhe222

拷贝配置好的到其它节点

hadoop@muhe221:~/soft$ scp -r spark-2.4.0-hadoop2.7   hadoop@caoming:~/soft/
hadoop@muhe221:~/soft$ scp -r spark-2.4.0-hadoop2.7   hadoop@muhe222:~/soft/

启动spark

hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-muhe221.out
muhe221: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-muhe221.out
muhe222: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-muhe222.out
caoming: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-caoming.out

各节点状态

hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ jps
9712 DataNode
10853 QuorumPeerMain
10085 ResourceManager
32055 Worker
9913 SecondaryNameNode
10217 NodeManager
32122 Jps
31946 Master
9550 NameNode
11231 HMaster

 

hadoop@muhe222:~/soft$ jps
3664 HRegionServer
27977 Worker
28138 Jps
3226 DataNode
3531 QuorumPeerMain

 

hadoop@caoming:~$ jps
6773 QuorumPeerMain
1639 Worker
3321 Jps
6170 DataNode
7198 HRegionServer

 运行wordcount例子

hadoop@muhe222:~/soft$ hadoop fs -mkdir /test
hadoop@muhe222:~/soft$ hadoop fs -put wordcount.txt /test/
hadoop@muhe222:~/soft$ hadoop fs -cat /test/wordcount.txt
Hello hadoop
hello spark
hello bigdata

源码

val file=sc.textFile("hdfs://muhe221:9000/test/wordcount.txt")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)

运行

hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ spark-shell
......
scala>
scala> val file=sc.textFile("hdfs://muhe221:9000/test/wordcount.txt")
file: org.apache.spark.rdd.RDD[String] = hdfs://muhe221:9000/test/wordcount.txt MapPartitionsRDD[1] at textFile at <console>:24
scala> val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25
scala> rdd.collect()
res0: Array[(String, Int)] = Array((Hello,1), (hello,2), (bigdata,1), (spark,1), (hadoop,1))
scala> rdd.foreach(println)
(spark,1)
(hadoop,1)
(Hello,1)
(hello,2)
(bigdata,1)

 

posted @ 2019-03-23 18:08  牧 天  阅读(107)  评论(0)    收藏  举报