Hadoop1.1.2和Ceph0.94结合

Hadoop1.1.2集群环境

Ceph0.94 (我部署的centos6.5,单节点集群) 按照ceph官网介绍,cephfs替换hdfs,只支持hadoop1.1系列。hadoop集群环境搭建参考我另外一篇文章http://www.178pt.com/archives/160.html hadoop2.7.3与ceph0.94结合,参考另外文章http://www.178pt.com/175.html 操作过程部分参考网络文章

一、Hadoop节点安装必要的包

需要在Hadoop节点上安装的包有: libcephfs-java libcephfs-jni

二、安装这两个包需要先添加Ceph的源

wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add - echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list sudo apt-get update sudo apt-get install libcephfs-jni libcephfs-java

三、创建软连接

将libcephfs_jni.so添加到Hadoop目录中: cd $HADOOP_HOME/lib/native/Linux-amd64-64 (or Linux-i386-32 on 32-bit platform) ln -s /usr/lib/jni/libcephfs_jni.so .

四、下载CephFS Hadoop 插件

Hadoop需要一个CephFS插件的支持,可以在http://ceph.com/download/hadoop-cephfs.jar下载到。 将插件放入每台hadoop节点中的/usr/local/hadoop/lib目录

五、创建pool

在Ceph控制节点上创建Hadoop使用的池: ceph osd pool create hadoop1 64 ceph osd pool set hadoop1 size 1 ceph mds add_data_pool hadoop1

六、授权Hadoop节点

在Hadoop节点上创建一个文件夹保存Ceph配置和keyring,下面以/etc/ceph为例,注意保证Hadoop用户可读这些文件。 装有ceph-deploy的节点: ceph-deploy admin ${$Hadoop_user}@${Hadoop_node_ip} 我直接拷贝ceph节点下/etc/ceph下所有文件到hadoop节点中的同一目录。没有的话就创建, Mkdir /etc/ceph Chown –R Hadoop:Hadoop /etc/ceph Hadoop节点: sudo chown -R hduser:hadoop /etc/ceph/

七、更改配置

在之前的hadoop环境下,停止集群。 打开conf/core-site.xml,作如下更改: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>   <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>ceph://192.168.0.151:6789/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property>   <property> <name>ceph.conf.file</name> <value>/etc/ceph/ceph.conf</value> </property> <property> <name>ceph.root.dir</name> <value>/</value> </property> <property> <name>ceph.auth.id</name> <value>admin</value> </property> <property> <name>ceph.auth.keyring</name> <value>/etc/ceph/ceph.client.admin.keyring</value> </property> <property> <name>ceph.data.pools</name> <value>hadoop1</value> </property> <property> <name>fs.ceph.impl</name> <value>org.apache.hadoop.fs.ceph.CephFileSystem</value> </property> </configuration>   更改hadoop-env.sh 打开$HADOOP_HOME/conf/hadoop-env.sh,加入libcephfs.jar的位置,一般是/usr/share/java/libcephfs.jar export JAVA_HOME=/home/hadoop/jdk1.7.0_79/ export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:/usr/local/hadoop/bin export HADOOP_CLASSPATH=/usr/share/java/libcephfs.jar   hadoop@master:/usr/local/hadoop/conf$ scp * hadoop@slave:/usr/local/hadoop/conf/

八、启动集群

hadoop@master:/usr/local/hadoop$ bin/start-all.sh starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-namenode-master.out 192.168.0.222: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-datanode-slave.out 192.168.0.221: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-master.out starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-master.out 192.168.0.222: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-slave.out

九、检验集群

hadoop@master:/usr/local/hadoop/logs$ jps 23867 JobTracker 23786 SecondaryNameNode 24012 Jps hadoop@slave:/usr/local/hadoop$ jps 15050 TaskTracker 15134 Jps 14927 DataNode 在ceph节点检验 [root@ceph151 ~]# netstat -anltp |grep 6789 tcp??????? 0????? 0 192.168.0.151:6789????????? 0.0.0.0:*?????????????????? LISTEN????? 1772/ceph-mon tcp??????? 0????? 0 192.168.0.151:37433???????? 192.168.0.151:6789????????? ESTABLISHED 2701/ceph-osd tcp??????? 0????? 0 192.168.0.151:6789????????? 192.168.0.151:37434???????? ESTABLISHED 1772/ceph-mon tcp??????? 0????? 0 192.168.0.151:6789????????? 192.168.0.222:56117???????? ESTABLISHED 1772/ceph-mon tcp??????? 0????? 0 192.168.0.151:37434???????? 192.168.0.151:6789????????? ESTABLISHED 3096/ceph-mds tcp??????? 0????? 0 192.168.0.151:6789????????? 192.168.0.221:35089???????? ESTABLISHED 1772/ceph-mon tcp??????? 0????? 0 192.168.0.151:6789????????? 192.168.0.151:37433???????? ESTABLISHED 1772/ceph-mon [root@ceph151 ~]# 可以看到192.168.0.221 192.168.0.222 通过6789端口与集群建立了联系。

十、测试cephfs替换hdfs

hadoop@master:/usr/local/hadoop$ hadoop dfs -rmr input Deleted ceph://192.168.0.151:6789/user/hadoop/input hadoop@master:/usr/local/hadoop$ hadoop dfs -rmr output Deleted ceph://192.168.0.151:6789/user/hadoop/output hadoop@master:/usr/local/hadoop$ 先删除之前创建的input? output目录 重新建立 hadoop@master:/usr/local/hadoop$ hadoop dfs -mkdir input hadoop@master:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/core-site.xml? input 16/12/09 12:02:57 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/user/hadoop/input/core-site.xml pool:repl=hadoop1:1 wanted=3 hadoop@master:/usr/local/hadoop$ 执行wordcount测试 hadoop@master:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/core-site.xml? input 16/12/09 12:02:57 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/user/hadoop/input/core-site.xml pool:repl=hadoop1:1 wanted=3 hadoop@master:/usr/local/hadoop$ hadoop jar hadoop-examples-1.1.2.jar wordcount input output 16/12/09 12:03:53 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/usr/local/hadoop/tmp/mapred/staging/hadoop/.staging/job_201612091155_0001/job.jar pool:repl=hadoop1:1 wanted=3 16/12/09 12:03:53 INFO input.FileInputFormat: Total input paths to process : 1 16/12/09 12:03:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library 16/12/09 12:03:53 WARN snappy.LoadSnappy: Snappy native library not loaded 16/12/09 12:03:53 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/usr/local/hadoop/tmp/mapred/staging/hadoop/.staging/job_201612091155_0001/job.split pool:repl=hadoop1:1 wanted=3 16/12/09 12:03:53 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/usr/local/hadoop/tmp/mapred/staging/hadoop/.staging/job_201612091155_0001/job.splitmetainfo pool:repl=hadoop1:1 wanted=3 16/12/09 12:03:53 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.151:6789/usr/local/hadoop/tmp/mapred/staging/hadoop/.staging/job_201612091155_0001/job.xml pool:repl=hadoop1:1 wanted=3 16/12/09 12:03:54 INFO mapred.JobClient: Running job: job_201612091155_0001 16/12/09 12:03:55 INFO mapred.JobClient:? map 0% reduce 0% 16/12/09 12:04:02 INFO mapred.JobClient:? map 100% reduce 0% 16/12/09 12:04:10 INFO mapred.JobClient:? map 100% reduce 33% 16/12/09 12:04:12 INFO mapred.JobClient:? map 100% reduce 100% 16/12/09 12:04:13 INFO mapred.JobClient: Job complete: job_201612091155_0001 16/12/09 12:04:13 INFO mapred.JobClient: Counters: 27 16/12/09 12:04:13 INFO mapred.JobClient:?? Job Counters 16/12/09 12:04:13 INFO mapred.JobClient:???? Launched reduce tasks=1 16/12/09 12:04:13 INFO mapred.JobClient:???? SLOTS_MILLIS_MAPS=7130 16/12/09 12:04:13 INFO mapred.JobClient:???? Total time spent by all reduces waiting after reserving slots (ms)=0 16/12/09 12:04:13 INFO mapred.JobClient:???? Total time spent by all maps waiting after reserving slots (ms)=0 16/12/09 12:04:13 INFO mapred.JobClient:???? Launched map tasks=1 16/12/09 12:04:13 INFO mapred.JobClient:???? SLOTS_MILLIS_REDUCES=9229 16/12/09 12:04:13 INFO mapred.JobClient:?? File Output Format Counters 16/12/09 12:04:13 INFO mapred.JobClient:???? Bytes Written=772 16/12/09 12:04:13 INFO mapred.JobClient:?? FileSystemCounters 16/12/09 12:04:13 INFO mapred.JobClient:???? FILE_BYTES_READ=914 16/12/09 12:04:13 INFO mapred.JobClient:???? CEPH_BYTES_WRITTEN=772 16/12/09 12:04:13 INFO mapred.JobClient:???? FILE_BYTES_WRITTEN=85838 16/12/09 12:04:13 INFO mapred.JobClient:?? File Input Format Counters 16/12/09 12:04:13 INFO mapred.JobClient:???? Bytes Read=0 16/12/09 12:04:13 INFO mapred.JobClient:?? Map-Reduce Framework 16/12/09 12:04:13 INFO mapred.JobClient:???? Map output materialized bytes=914 16/12/09 12:04:13 INFO mapred.JobClient:???? Map input records=40 16/12/09 12:04:13 INFO mapred.JobClient:???? Reduce shuffle bytes=914 16/12/09 12:04:13 INFO mapred.JobClient:???? Spilled Records=68 16/12/09 12:04:13 INFO mapred.JobClient:???? Map output bytes=1057 16/12/09 12:04:13 INFO mapred.JobClient:???? Total committed heap usage (bytes)=131010560 16/12/09 12:04:13 INFO mapred.JobClient:???? CPU time spent (ms)=1230 16/12/09 12:04:13 INFO mapred.JobClient:???? Combine input records=48 16/12/09 12:04:13 INFO mapred.JobClient:???? SPLIT_RAW_BYTES=122 16/12/09 12:04:13 INFO mapred.JobClient:???? Reduce input records=34 16/12/09 12:04:13 INFO mapred.JobClient:???? Reduce input groups=34 16/12/09 12:04:13 INFO mapred.JobClient:???? Combine output records=34 16/12/09 12:04:13 INFO mapred.JobClient:???? Physical memory (bytes) snapshot=250052608 16/12/09 12:04:13 INFO mapred.JobClient:???? Reduce output records=34 16/12/09 12:04:13 INFO mapred.JobClient:???? Virtual memory (bytes) snapshot=1637552128 16/12/09 12:04:13 INFO mapred.JobClient:???? Map output records=48 hadoop@master:/usr/local/hadoop$ hadoop dfs -ls output Found 3 items -rwxrwxrwx?? 1 hadoop????????? 0 2016-12-09 12:04 /user/hadoop/output/_SUCCESS drwxrwxrwx?? - hadoop????? 49220 2016-12-09 12:03 /user/hadoop/output/_logs -rwxrwxrwx?? 1 hadoop??????? 772 2016-12-09 12:04 /user/hadoop/output/part-r-00000   同时查看ceph集群状态 至此,hadoop1.1.2与ceph0.94结合成功。我尝试了很多次将hadoop2.7.3与ceph结合均没有成功。一直提示not hdfs文件系统。我捉摸着是否修改hadoop源码文件,不让判断文件系统。但是怎么改,我也不会。持续关注。。。。。。

posted on 2016-12-09 11:49  歪歪121  阅读(259)  评论(0)    收藏  举报