hadoop+tachyon+spark的zybo cluster集群综合配置

1.zybo cluster 架构简述:

1.1 zybo cluster 包含5块zybo 开发板组成一个集群,zybo的boot文件为digilent zybo reference design提供的启动文件,文件系统采用arm ubuntu。ip地址自上而下为192.168.1.1~5,hostname自上而下为spark1~5,另外由于sdka写入速度为2.3Mps,因而每个zybo卡另外配置一个Sandisk Cruzer Blade 32GB 作为usb拓展存储设备,写速度为4Mps,运行的程序和jdk都放在u盘中,且每个节点都在u盘中设立一个2GB大小的swap文件作为交换区空间,以加速集群。所有节点都连接到一个千兆交换机上。性质为纯ARM计算集群。

1.2 就hadoop来说,版本为2.4.0,官方可执行包。spark1运行namenode,也同时作为datanode,而spark2~4只作为datanode,由于U盘容量有限,5个节点capacity约150G。hadoop配置了对tachyon的支持(增加dependence的jre库)。

image

1.3 就tachyon来说,spark1节点作为master,其他4个节点作为slave,每个slave节点的ramdisk缓冲区为1G,underlayer filesystem为hadoop。tachyon的主要作用是作为数据缓冲层,减少直接读取hadoop的网络开销,从而提高大数据计算的速度。

image

1.4 就spark来说,spark1节点只作为spark master,而其余4个节点则作为counter slaves。spark配置了对tachyon的支持(增加dependence的jre库),增加了对tachyon master节点ip的环境变量。

image

 

2.打开集群测试操作

2.1将5个zybo都上电,它们将自动配置mac地址及ip地址,用串口登录spark1节点,cd到root目录,执行如下命令:

./gohadoop.sh

./gotachyon.sh

./gospark.sh

2.2如要运行多个tachyon,并避免多个tachyon master之间的竞争,配置容错集群,运行以下命令开启zookeeper。

./gozookeeper.sh

2.3当提示启动成功,前往spark 的安装目录运行以下命令开启python命令行。

MASTER=spark://192.168.1.1:7077 ./bin/pyspark

2.4若开启期间出错,可使用以下命令停止所有spark节点并重新启动:

./sbin/stop-all.sh

SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh

测试脚本见5.spark测试一节。

 

3.Hadoop 测试:

3.1开启hadoop demo命令:

cd /mnt/hadoop-2.4.0/
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
sbin/hadoop-daemon.sh start secondarynamenode
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager
sbin/mr-jobhistory-daemon.sh start historyserver

3.2正常启动hadoop所有节点命令:

cd /mnt/hadoop-2.4.0/
sbin/start-dfs.sh
sbin/start-yarn.sh

3.3用jps观察目前运行的java进程:

jps -l | sort -k 2

3.4用netstat命令来监测端口开启情况,hadoop namenode默认端口9000,可以观察网页http://192.168.1.1:9000

while [ `netstat -ntlp | grep 9000` -eq `echo` ]
do
sleep 1
done
netstat -ntlp | grep 9000

3.5等待datanode都启动完毕后,查看hdfs中的目录:

bin/hadoop dfs -ls /

3.6从本地拷贝/mnt/in文件夹进入hdfs中成为/in的命令:

bin/hadoop dfs -copyFromLocal /mnt/in /in

3.7如何利用hadoop计算wordcount(hadoop目录 /in下拥有需要计算的文件,/out为输出目录:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount /in /out

3.8读取hadoop中计算好wordcount生成的文件

bin/hadoop fs -ls /out

bin/hadoop dfs -cat /out/part-r-00000

3.9如果要向hadoop集群加入新节点,需要格式化hadoop的namenode,同时所有的datanode也将被格式化:

rm -rf /mnt/namenode
rm -rf /mnt/datanode
rm -rf /mnt/hadoop/tmp/*

并且应该在所有的spark2~4执行以下命令,以去除存储的namenodeID和datanodeID。

rm -rf /mnt/datanode

然后执行format命令格式化namenode:

cd /mnt/hadoop-2.4.0/
bin/hadoop namenode -format
bin/hadoop datanode -format

最后,启动hadoop。

3.10停止hadoop(demo,单机):

sbin/hadoop-daemon.sh stop namenode

sbin/hadoop-daemon.sh stop datanode

sbin/hadoop-daemon.sh stop secondarynamenode

sbin/yarn-daemon.sh stop resourcemanager

sbin/yarn-daemon.sh stop nodemanager

sbin/mr-jobhistory-daemon.sh stop historyserver

3.11停止hadoop(集群):

。。。

 

4.Tachyon测试:

4.1首先,格式化tachyon缓冲层,然后启动所有节点,并把ramdisk进行mount操作:

cd /mnt/tachyon-0.4.1

./bin/tachyon format
./bin/tachyon-stop.sh
./bin/tachyon-start.sh all Mount

4.2等待tachyon的master和slave就绪,也可以访问http://192.168.1.1:19999确定所有节点启动。

while [ `netstat -ntlp | grep 19998` -eq `echo` ]
do
sleep 1
done

jps -l | sort -k 2

4.3 加载under file system到tachyon,让tachyon明白hadoop中已有的目录和其中的所有文件信息,如果不执行此命令,可能会遇到Unknown under file system scheme 错误java.lang.IllegalArgumentException.

./bin/tachyon loadufs tachyon://192.168.1.1:19998 hdfs://192.168.1.1:9000 /

4.4测试tachyon:

简单测试:

./bin/tachyon runTest Basic CACHE_THROUGH

全面测试:

./bin/tachyon runTests

4.5用tachyon层+hadoop测试wordcount,在hadoop安装目录运行以下行:

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar \
wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar \
tachyon://192.168.1.1:19998/in/file /out/file

4.6 关闭tachyon

./bin/tachyon-stop.sh

 

5.Spark测试:

5.1启动spark集群

cd /mnt/spark-0.9.1-bin-hadoop2

SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh

5.2检查启动状况,使用如下命令,或查看网页http://192.168.1.1:8080

jps -l | sort -k 2

echo "please wait..."
while [ `netstat -ntlp | grep 7077` -eq `echo` ]
do
sleep 1
done
netstat -ntlp | grep 7077

5.3开启python spark 命令行:

cd /mnt/spark-0.9.1-bin-hadoop2

MASTER=spark://192.168.1.1:7077 ./bin/pyspark

5.4pi测试脚本(1000为number of samples采样个数):

from random import random
def sample(p):
    x, y = random(), random()
    return 1 if x*x + y*y < 1 else 0

count = sc.parallelize(xrange(0, 1000)).map(sample) \
             .reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / 1000)

5.5 wordcount的hadoop版:

SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
MASTER=spark://192.168.1.1:7077 ./bin/pyspark
file = sc.textFile("hdfs://192.168.1.1:9000/test/file1M")
file = sc.textFile("hdfs://192.168.1.1:9000/test/file10M")
file = sc.textFile("hdfs://192.168.1.1:9000/test/file100M")
file = sc.textFile("hdfs://192.168.1.1:9000/test/file1G")
file = sc.textFile("hdfs://192.168.1.1:9000/test/file10G")
file = sc.textFile("hdfs://192.168.1.1:9000/test/file100G")
counts = file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.collect()
counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/outfile") 

5.6workcount的tachyon版:

SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
MASTER=spark://192.168.1.1:7077 ./bin/pyspark
file = sc.textFile("tachyon://192.168.1.1:19998/test/file1M")
file = sc.textFile("tachyon://192.168.1.1:19998/test/file10M")
file = sc.textFile("tachyon://192.168.1.1:19998/test/file100M")
file = sc.textFile("tachyon://192.168.1.1:19998/test/file1G")
file = sc.textFile("tachyon://192.168.1.1:19998/test/file10G")
file = sc.textFile("tachyon://192.168.1.1:19998/test/file100G")
counts = file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.collect()
counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/outfile") 

其他测试请参考reference.5

5.7关闭spark:

./sbin/stop-all.sh

 

6.节点通用配置

6.1配置计算机名(如第二个节点就配置成spark2)。

vi /etc/hostname

spark2

6.2配置本地网络节点信息配置文件(以spark1为例)。

vi /etc/hosts

#127.0.0.1      localhost       zynq
192.168.1.1     spark1          localhost
192.168.1.2     spark2
192.168.1.3     spark3
192.168.1.4     spark4
192.168.1.5     spark5
#::1            localhost ip6-localhost ip6-loopback

6.3配置ipv6为disable(reboot生效):

vi /etc/sysctl.conf

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

6.4路径环境变量及启动加载配置(以spark1节点为例):

vi /etc/profile

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH
export JAVA_HOME=/mnt/jdk1.7.0_55
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_HOME=/mnt/hadoop-2.4.0

export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

ifconfig eth2 hw ether 00:0a:35:00:01:01
ifconfig eth2 192.168.1.1/24 up

6.5ssh配置:

生成公匙 id_rsa.pub 配置文件(一路回车):

ssh-keygen -t rsa

把localhost加入签名:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

分发公钥:

ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark1

ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark2

ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark3

ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark4

ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark5

6.6配置java

cd /usr/bin/

ln -s /usr/lib/jdk1.7.0_55/bin/java java

ln -s /usr/lib/jdk1.7.0_55/bin/javac javac

ln -s /usr/lib/jdk1.7.0_55/bin/jar jar

6.7配置swap

打印当前内存空间情况:

free -m

创建一个swap文件:

cd /mnt
mkdir swap
cd swap/

dd if=/dev/zero of=swapfile bs=1024 count=1000000

把生成的文件转换成swap文件 :

mkswap swapfile

激活swap文件 :

swapon swapfile
free -m

 

7.Hadoop配置

cd /mnt/hadoop-2.4.0

7.1配置hadoop运行环境:

vi etc/hadoop/hadoop-env.sh

export JAVA_HOME=/mnt/jdk1.7.0_55
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/mnt/tachyon-0.4.1/target/tachyon-0.1-jar-with-dependencies.jar

7.2配置yarn-site

vi etc/hadoop/yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

</configuration>

7.3配置core-site

首先建立:/mnt/hadoop/tmp目录

vi etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.1.1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/mnt/hadoop/tmp</value>
    </property>
    <property>
        <name>fs.tachyon.impl</name>
        <value>tachyon.hadoop.TFS</value>
    </property>
</configuration>

7.4配置hdfs-site

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address</name>
        <value>192.168.1.1:9000</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/mnt/datanode</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/mnt/namenode</value>
    </property>
</configuration>

7.5配置mapred-site

vi etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

7.6master配置为192.168.1.1,slave配置成5个节点的ip地址即可

 

8.Tachyon配置

cd /mnt/tachyon-0.4.1

8.1配置tachyon环境:

vi conf/tachyon-env.sh

if [[ `uname -a` == Darwin* ]]; then
  # Assuming Mac OS X
  export JAVA_HOME=$(/usr/libexec/java_home)
  export TACHYON_RAM_FOLDER=/Volumes/ramdisk
  export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.k="
else
  # Assuming Linux
  if [ -z "$JAVA_HOME" ]; then
    export JAVA_HOME=/mnt/jdk1.7.0_55
  fi
  export TACHYON_RAM_FOLDER=/mnt/ramdisk
fi

export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=192.168.1.1
#export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
#export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.1:9000
export TACHYON_WORKER_MEMORY_SIZE=1GB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem

CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

export TACHYON_JAVA_OPTS+="
  -Dlog4j.configuration=file:$CONF_DIR/log4j.properties
  -Dtachyon.debug=false
  -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
  -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
  -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
  -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
  -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
  -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
  -Dtachyon.master.worker.timeout.ms=60000
  -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
  -Dtachyon.master.journal.folder=/mnt/journal/
  -Dtachyon.master.pinlist=/pinfiles;/pindata
  -Dorg.apache.jasper.compiler.disablejsr199=true
"

8.2若使用zookeeper,的配置如下:

if [[ `uname -a` == Darwin* ]]; then
  # Assuming Mac OS X
  export JAVA_HOME=$(/usr/libexec/java_home)
  export TACHYON_RAM_FOLDER=/Volumes/ramdisk
  export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.k="
else
  # Assuming Linux
  if [ -z "$JAVA_HOME" ]; then
    export JAVA_HOME=/usr/lib/jdk1.7.0_55
  fi
  export TACHYON_RAM_FOLDER=/mnt/ramdisk
fi

export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=192.168.1.1
#export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs
#export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.1:9000
export TACHYON_WORKER_MEMORY_SIZE=1GB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
#export TACHYON_UNDERFS_HDFS_IMPL=fs.defaultFS
export TACHYON_ZOOKEEPER_ADDRESS=192.168.1.1:2181

CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

export TACHYON_JAVA_OPTS+="
  -Dlog4j.configuration=file:$CONF_DIR/log4j.properties
  -Dtachyon.debug=false
  -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
  -Dtachyon.usezookeeper=true
  -Dtachyon.zookeeper.address=$TACHYON_ZOOKEEPER_ADDRESS
  -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
  -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
  -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
  -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
  -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
  -Dtachyon.master.worker.timeout.ms=60000
  -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
  -Dtachyon.master.journal.folder=hdfs://192.168.1.1:9000/tachyon/journal/
  -Dtachyon.master.pinlist=/pinfiles;/pindata
  -Dorg.apache.jasper.compiler.disablejsr199=true
"

8.3.配置slaves为192.168.1.2~5


9. Spark配置

cd /mnt/spark-0.9.1-bin-hadoop2/

9.1配置core-site

vi conf/core-site.xml

<configuration>
  <property>
    <name>fs.tachyon.impl</name>
    <value>tachyon.hadoop.TFS</value>
  </property>
</configuration>

9.2配置core-site

vi conf/spark-env.sh

JAVA_HOME=/mnt/jdk1.7.0_55
SPARK_MASTER_IP=192.168.1.1
SPARK_CLASSPATH=/mnt/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.r:$SPARK_CLASSPATH
export SPARK_CLASSPATH

9.3配置slaves为192.168.1.2~5


10.配置zookeeper:

cd /mnt/zookeeper-3.3.6

vi conf/zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/mnt/zookeeper
# the port at which the clients will connect
clientPort=2181
#server.1=192.168.1.1:2888:3888
#server.2=192.168.1.2:2888:3888

 

11.问题

(hadoop节点识别机制):当需要向集群中加入一个新的datanode节点时,我们会复制当前的一个节点的sd卡到新的节点中,这会造成hadoop的datanode监控页面中这个被复制节点和新节点竞争的局面,因为hadoop不是根据ip,mac或者机器名来识别一个节点。相反,namespaceID是hadoop集群的唯一标识符,namenode通过此ID来识别自己集群中的datanode。

参考:http://blog.csdn.net/xiaojiafei/article/details/10152395

解决:清空新节点的etc/hadoop/hdfs-site.xml中定义的namenode文件夹。

 

12. reference:

1.Digilent zybo Ref Design

http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,1198&Prod=ZYBO

2.Oracle JDK7 for ARM

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-arm-downloads-2187468.html

3.What is hadoop:

http://hadoop.apache.org/

4.What is spark:

http://spark.apache.org/

5.Spark example code:

http://spark.apache.org/examples.html

6.What is hdfs:

http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

7.What is tachyon:

http://tachyon-project.org/

8.Tachyon github:

https://github.com/amplab/tachyon/releases

9.What is Zoo Keeper:

http://zookeeper.apache.org/

posted @ 2014-07-19 16:04  Ю詺菛╀時代  阅读(1492)  评论(0编辑  收藏  举报