Linux安装Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)
说明:Spark可以在只安装了JDK、scala的机器上直接单机安装,但是这样的话只能使用单机模式运行不涉及分布式运算和分布式存储的代码,例如可以单机安装Spark,单机运行计算圆周率的Spark程序。但是我们要运行的是Spark集群,并且需要调用hadoop的分布式文件系统,所以请你先安装Hadoop,Hadoop集群的安装可以参考该博文:
http://blog.csdn.net/pucao_cug/article/details/71698903
安装单机版的Spark可以参考该博文:
http://blog.csdn.net/pucao_cug/article/details/72377219
Spark集群的最小化安装只需要安装这些东西:JDK 、Scala 、Hadoop 、Spark
1 安装Spark依赖的Scala
Hadoop的安装请参考上面提到的博文,因为Spark依赖scala,所以在安装Spark之前,这里要先安装scala。在每个节点上都进行安装。
1.1 下载和解压缩Scala
打开地址:http://www.scala-lang.org/
目前最新版是2.12.2,我就安装该版本
如图:
直接打开下面的地址也可以:
http://www.scala-lang.org/download/2.12.2.html
如图:
直接用下面的地址下载tgz包也可以:
https://downloads.lightbend.com/scala/2.12.2/scala-2.12.2.tgz
在linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去
如图:
执行命令,进入到该目录:
cd /opt/scala
执行命令进行解压缩:
tar -xvf scala-2.12.2
1.2 配置环境变量
编辑/etc/profile这个文件,在文件中增加一行配置:
- export SCALA_HOME=/opt/scala/scala-2.12.2
export SCALA_HOME=/opt/scala/scala-2.12.2
在该文件的PATH变量中增加下面的内容:
- {SCALA_HOME}/bin </span></span></li></ol></div><pre><code class="language-plain">{SCALA_HOME}/bin </span></span></li></ol></div><pre><code class="language-plain">{SCALA_HOME}/bin
添加完成后,我的/etc/profile的配置如下:
- export JAVA_HOME=/opt/java/jdk1.8.0_121
- export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
- export HADOOP_CONF_DIR={HADOOP_HOME}/etc/hadoop </span></li><li class=""><span>export HADOOP_COMMON_LIB_NATIVE_DIR={HADOOP_HOME}/etc/hadoop </span></li><li class=""><span>export HADOOP_COMMON_LIB_NATIVE_DIR={HADOOP_HOME}/lib/native
- export HADOOP_OPTS=”-Djava.library.path={HADOOP_HOME}/lib" </span></li><li class=""><span>export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin </span></li><li class="alt"><span>export HIVE_CONF_DIR={HADOOP_HOME}/lib" </span></li><li class=""><span>export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin </span></li><li class="alt"><span>export HIVE_CONF_DIR={HIVE_HOME}/conf
- export SQOOP_HOME=/opt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
- export HBASE_HOME=/opt/hbase/hbase-1.2.5
- export ZK_HOME=/opt/zookeeper/zookeeper-3.4.10
- export SCALA_HOME=/opt/scala/scala-2.12.2
- export CLASS_PATH=.:JAVAHOME/lib:JAVAHOME/lib:{HIVE_HOME}/lib:CLASS_PATH </span></li><li class="alt"><span>export PATH=.:CLASS_PATH </span></li><li class="alt"><span>export PATH=.:{JAVA_HOME}/bin:HADOOPHOME/bin:HADOOPHOME/bin:{HADOOP_HOME}/sbin:SPARKHOME/bin:SPARKHOME/bin:{ZK_HOME}/bin:HIVEHOME/bin:HIVEHOME/bin:{SQOOP_HOME}/bin:HBASEHOME/bin:HBASEHOME/bin:{SCALA_HOME}/bin:$PATH
export JAVA_HOME=/opt/java/jdk1.8.0_121 export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib" export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin export HIVE_CONF_DIR=${HIVE_HOME}/conf export SQOOP_HOME=/opt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha export HBASE_HOME=/opt/hbase/hbase-1.2.5 export ZK_HOME=/opt/zookeeper/zookeeper-3.4.10 export SCALA_HOME=/opt/scala/scala-2.12.2 export CLASS_PATH=.:${JAVA_HOME}/lib:${HIVE_HOME}/lib:$CLASS_PATH export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${ZK_HOME}/bin:${HIVE_HOME}/bin:${SQOOP_HOME}/bin:${HBASE_HOME}/bin:${SCALA_HOME}/bin:$PATH说明:你可以只关注开头说的JDK SCALA Hadoop Spark的环境变量,其余的诸如Zookeeper、Hbase、Hive、Sqoop都不用管。
如图:
环境变量配置完成后,执行下面的命令:
- source /etc/profile
source /etc/profile1.3 验证Scala
执行命令:
- scala -version
scala -version如图:
2 下载和解压缩Spark
在每个节点上都安装Spark,也就是重复下面的步骤。
2.1 下载Spark压缩包
打开下载地址:
http://spark.apache.org/downloads.html
如图:
点击上图的 Download Spark,相当于是直接打开地址:
https://www.apache.org/dyn/closer.lua/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz
下载后得到了大约200M的文件: spark-2.1.1-bin-hadoop2.7
直接用下面的地址下面也可以:
http://mirrors.hust.edu.cn/apache/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz
2.2 解压缩Spark
下载完成后,在Linux服务器的opt目录下新建一个名为spark的文件夹,把刚才下载的压缩包,上载上去。
如图:
进入到该目录内,也就是执行下面的命令:
cd /opt/spark
执行解压缩命令:
tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz
3 Spark相关的配置
说明:因为我们搭建的是基于hadoop集群的Spark集群,所以每个hadoop节点上我都安装了Spark,都需要按照下面的步骤做配置,启动的话只需要在Spark集群的Master机器上启动即可,我这里是在hserver1上启动。
3.1 配置环境变量
编辑/etc/profile文件,增加
- export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7上面的变量添加完成后编辑该文件中的PATH变量,添加
- {SPARK_HOME}/bin </span></span></li></ol></div><pre><code class="language-plain">{SPARK_HOME}/bin </span></span></li></ol></div><pre><code class="language-plain">{SPARK_HOME}/bin
注意:因为SPARKHOME/sbin</span><spanstyle="color:red">目录下有一些文件名称和</span><spanstyle="color:red">SPARKHOME/sbin</span><spanstyle="color:red">目录下有一些文件名称和</span><spanstyle="color:red">HADOOP_HOME/sbin目录下的文件同名,为了避免同名文件冲突,这里不在PATH变量里添加SPARKHOME/sbin只添加了<spanstyle="color:rgb(255,0,0);font−size:14px"><strong>SPARKHOME/sbin只添加了<spanstyle="color:rgb(255,0,0);font−size:14px"><strong>SPARK_HOME/bin。
修改完成后,我的/etc/profile文件内容是:
- export JAVA_HOME=/opt/java/jdk1.8.0_121
- export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
- export HADOOP_CONF_DIR={HADOOP_HOME}/etc/hadoop </span></li><li class=""><span>export HADOOP_COMMON_LIB_NATIVE_DIR={HADOOP_HOME}/etc/hadoop </span></li><li class=""><span>export HADOOP_COMMON_LIB_NATIVE_DIR={HADOOP_HOME}/lib/native
- export HADOOP_OPTS=”-Djava.library.path={HADOOP_HOME}/lib" </span></li><li class=""><span>export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin </span></li><li class="alt"><span>export HIVE_CONF_DIR={HADOOP_HOME}/lib" </span></li><li class=""><span>export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin </span></li><li class="alt"><span>export HIVE_CONF_DIR={HIVE_HOME}/conf
- export SQOOP_HOME=/opt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
- export HBASE_HOME=/opt/hbase/hbase-1.2.5
- export ZK_HOME=/opt/zookeeper/zookeeper-3.4.10
- export SCALA_HOME=/opt/scala/scala-2.12.2
- export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
- export CLASS_PATH=.:JAVAHOME/lib:JAVAHOME/lib:{HIVE_HOME}/lib:CLASS_PATH </span></li><li class=""><span>export PATH=.:CLASS_PATH </span></li><li class=""><span>export PATH=.:{JAVA_HOME}/bin:HADOOPHOME/bin:HADOOPHOME/bin:{HADOOP_HOME}/sbin:SPARKHOME/bin:SPARKHOME/bin:{ZK_HOME}/bin:HIVEHOME/bin:HIVEHOME/bin:{SQOOP_HOME}/bin:HBASEHOME:HBASEHOME:{SCALA_HOME}/bin:$PATH
export JAVA_HOME=/opt/java/jdk1.8.0_121 export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib" export HIVE_HOME=/opt/hive/apache-hive-2.1.1-bin export HIVE_CONF_DIR=${HIVE_HOME}/conf export SQOOP_HOME=/opt/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha export HBASE_HOME=/opt/hbase/hbase-1.2.5 export ZK_HOME=/opt/zookeeper/zookeeper-3.4.10 export SCALA_HOME=/opt/scala/scala-2.12.2 export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7 export CLASS_PATH=.:${JAVA_HOME}/lib:${HIVE_HOME}/lib:$CLASS_PATH export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${ZK_HOME}/bin:${HIVE_HOME}/bin:${SQOOP_HOME}/bin:${HBASE_HOME}:${SCALA_HOME}/bin:$PATH说明:你可以只关注开头说的JDK SCALA Hadoop Spark的环境变量,其余的诸如Zookeeper、hbase、hive、Sqoop都不用管。
如图:
编辑完成后,执行命令:
source /etc/profile
3.2 配置conf目录下的文件
对/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录下的文件进行配置。
3.2.1 新建spark-env.h文件
执行命令,进入到/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录内:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7/conf
以spark为我们创建好的模板创建一个spark-env.h文件,命令是:
cp spark-env.sh.template spark-env.sh
如图:
编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):
- export SCALA_HOME=/opt/scala/scala-2.12.2
- export JAVA_HOME=/opt/java/jdk1.8.0_121
- export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
- export SPARK_MASTER_IP=hserver1
- export SPARK_EXECUTOR_MEMORY=1G
export SCALA_HOME=/opt/scala/scala-2.12.2 export JAVA_HOME=/opt/java/jdk1.8.0_121 export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7 export SPARK_MASTER_IP=hserver1 export SPARK_EXECUTOR_MEMORY=1G- 1
- 2
- 3
- 4
- 5
- 6
3.2.2 新建slaves文件
执行命令,进入到/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录内:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7/conf
以spark为我们创建好的模板创建一个slaves文件,命令是:
cp slaves.template slaves
如图:
编辑slaves文件,里面的内容为:
- hserver2
- hserver3
hserver2 hserver3- 1
4 启动和测试Spark集群
4.1 启动Spark
因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。Hadoop2.8.0的安装和启动,请参考该博文:
http://blog.csdn.net/pucao_cug/article/details/71698903
在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7/sbin
执行启动脚本:
./start-all.sh
如图:
完整内容是:
- [root@hserver1 sbin]# cd/opt/spark/spark-2.1.1-bin-hadoop2.7/sbin
- [root@hserver1 sbin]# ./start-all.sh
- starting org.apache.spark.deploy.master.Master,logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-hserver1.out
- hserver2: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hserver2.out
- hserver3: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hserver3.out
- [root@hserver1 sbin]#
[root@hserver1 sbin]# cd/opt/spark/spark-2.1.1-bin-hadoop2.7/sbin [root@hserver1 sbin]# ./start-all.sh starting org.apache.spark.deploy.master.Master,logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-hserver1.out hserver2: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hserver2.out hserver3: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark/spark-2.1.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hserver3.out [root@hserver1 sbin]#- 1
- 2
- 3
- 4
- 5
注意:上面的命令中有./这个不能少,./的意思是执行当前目录下的start-all.sh脚本。
4.2 测试和使用Spark集群
4.2.1 访问Spark集群提供的URL
在浏览器里访问Mster机器,我的Spark集群里Master机器是hserver1,IP地址是192.168.27.143,访问8080端口,URL是:http://192.168.27.143:8080/
如图:
4.2.2 运行Spark提供的计算圆周率的示例程序
这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。
第一步,进入到Spark的根目录,也就是执行下面的脚本:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7
如图:
第二步,调用Spark自带的计算圆周率的Demo,执行下面的命令:
- ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master local examples/jars/spark-examples_2.11-2.1.1.jar
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.1.1.jar命令执行后,spark示例程序已经开始执行
如图:
很快执行结果出来了,执行结果我用红框标出来了
如图:

浙公网安备 33010602011771号