Spark
先安装scala + hadoop 这里省略
下载spark安装包
http://apache.communilink.net/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
hadoop@muhe221:~/soft$ tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz hadoop@muhe221:~/soft$ mv spark-2.4.0-bin-hadoop2.7 spark-2.4.0-hadoop2.7 hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/conf$ cp spark-env.sh.template spark-env.sh hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/conf$ cp slaves.template slaves
修改spark-env.sh
export SCALA_HOME=~/soft/scala-2.12.8 export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121 #注意修改 export SPARK_MASTER_IP=muhe221 export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=~/soft/hadoop-2.7.7/etc/hadoop
修改~/.bashrc
export SPARK_HOME=~/soft/spark-2.4.0-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
修改slaves
muhe221
caoming
muhe222
拷贝配置好的到其它节点
hadoop@muhe221:~/soft$ scp -r spark-2.4.0-hadoop2.7 hadoop@caoming:~/soft/
hadoop@muhe221:~/soft$ scp -r spark-2.4.0-hadoop2.7 hadoop@muhe222:~/soft/
启动spark
hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ ./start-all.sh starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-muhe221.out muhe221: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-muhe221.out muhe222: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-muhe222.out caoming: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/soft/spark-2.4.0-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-caoming.out
各节点状态
hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ jps 9712 DataNode 10853 QuorumPeerMain 10085 ResourceManager 32055 Worker 9913 SecondaryNameNode 10217 NodeManager 32122 Jps 31946 Master 9550 NameNode 11231 HMaster
hadoop@muhe222:~/soft$ jps 3664 HRegionServer 27977 Worker 28138 Jps 3226 DataNode 3531 QuorumPeerMain
hadoop@caoming:~$ jps 6773 QuorumPeerMain 1639 Worker 3321 Jps 6170 DataNode 7198 HRegionServer
运行wordcount例子
hadoop@muhe222:~/soft$ hadoop fs -mkdir /test hadoop@muhe222:~/soft$ hadoop fs -put wordcount.txt /test/ hadoop@muhe222:~/soft$ hadoop fs -cat /test/wordcount.txt Hello hadoop hello spark hello bigdata
源码
val file=sc.textFile("hdfs://muhe221:9000/test/wordcount.txt")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)
运行
hadoop@muhe221:~/soft/spark-2.4.0-hadoop2.7/sbin$ spark-shell ...... scala> scala> val file=sc.textFile("hdfs://muhe221:9000/test/wordcount.txt") file: org.apache.spark.rdd.RDD[String] = hdfs://muhe221:9000/test/wordcount.txt MapPartitionsRDD[1] at textFile at <console>:24 scala> val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_) rdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25 scala> rdd.collect() res0: Array[(String, Int)] = Array((Hello,1), (hello,2), (bigdata,1), (spark,1), (hadoop,1)) scala> rdd.foreach(println) (spark,1) (hadoop,1) (Hello,1) (hello,2) (bigdata,1)

浙公网安备 33010602011771号