spark2.0 安装测试
spark2.0 安装测试
相关链接:
Linux环境下(ubuntu)如何编译hadoop2.7 http://www.cnblogs.com/JustSunh/articles/5818007.html
spark安装 - 安装ssh http://www.cnblogs.com/JustSunh/articles/5817843.html
spark安装步骤 - 安装hadoop http://www.cnblogs.com/JustSunh/articles/5817911.html
spark2.0 安装配置 http://www.cnblogs.com/JustSunh/articles/5817917.html
spark2.0 测试 http://www.cnblogs.com/JustSunh/articles/5818020.html
standalone 模式
第一步:上传文件到hdfs
1.进入 hadoop目录 cd /app/hadoop-2.7.2/
2/启动hadoop
sbin/stop-all.sh
sbin ./stop-dfs.sh
sbin ./stop-yarn.sh
3.删除旧的临时目录
rm -rf /app/hadoop-2.7.2/tmp
rm -rf /app/hadoop-2.7.2/logs/*
rm -rf /app/hadoop-2.7.2/dfs/name/*
rm -rf /app/hadoop-2.7.2/dfs/data/*
rm -rf /app/spark-2.0.0-bin-hadoop2.7/logs/*
4.格式化
bin/hdfs namenode -format
5.启动hadoop
sbin/start-all.sh
sbin ./start-dfs.sh
sbin ./start-yarn.sh
6.创建上传目录
bin/hdfs dfs -mkdir /input
bin/hdfs dfs -ls /
7. 上传文件
bin/hdfs dfs -put /app/hadoop-2.7.2/README.txt /input/
bin/hdfs dfs -ls /input/
#hadoop jar /app/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
第二步:spark 执行统计
进入spark 目录
cd /app/spark-2.0.0-bin-hadoop2.7/
调用脚本
MASTER=spark://172.21.30.22:7077 /app/spark-2.0.0-bin-hadoop2.7/bin/spark-shell
执行scale语句
scala>
val file=sc.textFile("hdfs://172.21.30.22:9009/input/README.txt")
scala>
val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
scala>
count.collect()
yarn 模式
native 32 覆盖 64
tar -xvf hadoop-native-64-2.7.0.tar -C /app/hadoop-2.7.2/lib/native/
tar -xvf hadoop-native-64-2.6.0.tar -C /app/hadoop-2.7.2/lib/native/
/app/spark-2.0.0-bin-hadoop2.7/bin/ 下执行
#spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster /app/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples*.jar 10
spark-submit --class org.apache.spark.examples.SparkPi --master yarn /app/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples*.jar 10
2. Spark环境验证
我们先执行官网自带的计算Pi的例子:
[myspark@SZB-L0029554 ~]$ spark-submit --masterspark://SZB-L0029554:7077 --class org.apache.spark.examples.SparkPi --nameSparkPI /var/lib/myspark/spark/examples/jars/spark-examples_2.11-2.0.0.jar
输出结果为:
Pi is roughly 3.1383756918784593
下面我们通过Yarn资源调度平台来执行相关操作:
[myspark@SZB-L0029554 ~]$ spark-shell --master yarn
scala> val textFile =sc.textFile("hdfs://172.21.30.22:9009/input/README.txt")
scala> textFile.filter(line =>line.contains("Spark")).count()
我们上面执行的是统计HDFS文件系统上的/myspark/testdata/README.md文件里面包含Spark的行数。

浙公网安备 33010602011771号