tachyon of zybo cluster

把Tachyon层加入spark和hadoop之间，以加速集群

官网：http://tachyon-project.org/

github：https://github.com/amplab/tachyon/releases

（1）准备工作：

wget http://tachyon-project.org/downloads/tachyon-0.4.1-bin.tar.gz
tar xvfz tachyon-0.4.1-bin.tar.gz
cd tachyon-0.4.1

cp conf/tachyon-env.sh.template conf/tachyon-env.sh

（2）在本地测试：

vi conf/tachyon-env.sh

./bin/tachyon format
./bin/tachyon-start.sh local
./bin/tachyon runTest Basic CACHE_THROUGH

（3）与Hadoop结合：Set HDFS as Tachyon’s under filesystem

因为2.4.0的hadoop需要重新编译，在arm平台安装maven会出错，故转移到x64pc机编译：

apt-get install maven

vi pom.xml

mvn -Dhadoop.version=2.4.0 clean package

cp -r /root/tachyon-0.4.1 /media/fs/root/

cd /root/tachyon-0.4.1

cd ..

cd hadoop-2.4.0/

vi etc/hadoop/core-site.xml

<property>
  <name>fs.tachyon.impl</name>
  <value>tachyon.hadoop.TFS</value>
</property>

vi etc/hadoop/hadoop-env.sh

加入一行：

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/tachyon-0.4.1/target/tachyon-0.4
.1-jar-with-dependencies.jar

cd /root

./gohadoop.sh

cd tachyon-0.4.1

./bin/tachyon format

./bin/tachyon-start.sh local
./bin/tachyon runTest Basic CACHE_THROUGH

cd $HADOOP_HOME

执行如下命令：

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar \

wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar \

tachyon://192.168.1.1:19998/in/file /out/file

（4）与Spark结合：Running Spark on Tachyon

cd spark-0.9.1-bin-hadoop2

vi conf/spark-env.sh

SPARK_CLASSPATH=/root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:$SPARK_CLASSPATH
export SPARK_CLASSPATH

export TACHYON_MASTER="192.168.1.1:19998"

新建一个配置文件：

vi conf/core-site.xml

<configuration>
  <property>
    <name>fs.tachyon.impl</name>
    <value>tachyon.hadoop.TFS</value>
  </property>
</configuration>

运行

MASTER=spark://192.168.1.1:7077 ./bin/pyspark
file = sc.textFile("tachyon://192.168.1.1:19998/in/file")
counts = file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.collect()

counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/mycount")

counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/mycount1")

collect()正确执行，

save to hadoop 正确执行，

save to tachyon 后出错：

参考网站：http://tachyon-project.org/Syncing-the-Underlying-Filesystem.html

暂未解决。

先只测试用Tachyon读数据1G大小的文本文件：

使用hadoop读取使用了16分钟。

scp tachyon-0.4.1.bak2.tar.gz root@spark4:/root/

posted @ 2014-07-10 18:00 Ю詺菛╀時代阅读(499) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Programing for joy

tachyon of zybo cluster

公告