tachyon of zybo cluster

把Tachyon层加入spark和hadoop之间,以加速集群

官网:http://tachyon-project.org/

github:https://github.com/amplab/tachyon/releases

(1)准备工作:

wget http://tachyon-project.org/downloads/tachyon-0.4.1-bin.tar.gz
tar xvfz tachyon-0.4.1-bin.tar.gz
cd tachyon-0.4.1

cp conf/tachyon-env.sh.template conf/tachyon-env.sh

 

(2)在本地测试:

vi conf/tachyon-env.sh

image

 

./bin/tachyon format
./bin/tachyon-start.sh local
./bin/tachyon runTest Basic CACHE_THROUGH

image

image

image

 

(3)与Hadoop结合:Set HDFS as Tachyon’s under filesystem

因为2.4.0的hadoop需要重新编译,在arm平台安装maven会出错,故转移到x64pc机编译:

apt-get install maven

vi pom.xml

{`URP5$~}$M056}P20LTB`5

mvn -Dhadoop.version=2.4.0 clean package

image

cp -r /root/tachyon-0.4.1 /media/fs/root/

cd /root/tachyon-0.4.1

image

cd ..

cd hadoop-2.4.0/

vi etc/hadoop/core-site.xml

image

<property>
  <name>fs.tachyon.impl</name>
  <value>tachyon.hadoop.TFS</value>
</property>

vi etc/hadoop/hadoop-env.sh

加入一行:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/tachyon-0.4.1/target/tachyon-0.4
.1-jar-with-dependencies.jar

cd /root

./gohadoop.sh

cd tachyon-0.4.1

./bin/tachyon format

image

./bin/tachyon-start.sh local
./bin/tachyon runTest Basic CACHE_THROUGH

image

cd $HADOOP_HOME
执行如下命令:
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar \
wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar \
tachyon://192.168.1.1:19998/in/file /out/file
image

(4)与Spark结合:Running Spark on Tachyon

cd spark-0.9.1-bin-hadoop2

vi conf/spark-env.sh

image

SPARK_CLASSPATH=/root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:$SPARK_CLASSPATH
export SPARK_CLASSPATH

export TACHYON_MASTER="192.168.1.1:19998"

新建一个配置文件:

vi conf/core-site.xml

image

<configuration>
  <property>
    <name>fs.tachyon.impl</name>
    <value>tachyon.hadoop.TFS</value>
  </property>
</configuration>

 

运行

MASTER=spark://192.168.1.1:7077 ./bin/pyspark
file = sc.textFile("tachyon://192.168.1.1:19998/in/file")
counts = file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.collect()

counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/mycount")

counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/mycount1")

collect()正确执行,

save to hadoop 正确执行,

save to tachyon 后出错:

image

参考网站:http://tachyon-project.org/Syncing-the-Underlying-Filesystem.html

暂未解决。

先只测试用Tachyon读数据1G大小的文本文件:

使用hadoop读取使用了16分钟。

 

 

scp tachyon-0.4.1.bak2.tar.gz root@spark4:/root/

posted @ 2014-07-10 18:00  Ю詺菛╀時代  阅读(499)  评论(0编辑  收藏  举报