Hadoop2.7.3和Ceph0.94结合

Hadoop2.7.3集群环境 Ceph0.94 (我部署的centos6.5,单节点集群) 按照上一篇hadoop集群搭建手册接着进行部署安装。

一、Hadoop节点安装必要的包以及节点间ssh免密码互联

需要在Hadoop节点上安装的包有： cephfs-java libcephfs1-devel python-cephfs libcephfs_jni1-devel 将ceph节点与hadoop进行ssh免密码互通设置包括与自己本机。过程省略

二、hadoop添加Ceph的源并安装依赖包

在ceph mon节点上将ceph yum源传到hadoop节点 [root@hadoop_ceph yum.repos.d]# scp ceph.repo root@lqy2:/etc/yum.repos.d/ ceph.repo 100% 603 0.6KB/s 00:00 [root@hadoop_ceph yum.repos.d]# scp ceph.repo root@lqy1:/etc/yum.repos.d/ ceph.repo 100% 603 0.6KB/s 00:00 [root@hadoop_ceph yum.repos.d]# Hadoop节点安装依赖包 [root@lqy1 ~]# yum install cephfs-java libcephfs1-devel python-cephfs libcephfs_jni1-devel 遇到问题：

安装过程中提示缺少包，yum无法安装，故安装了epel源（安装方法百度）
GPG key retrieval failed 。Key检测不过，修改源文件 gpgcheck=1 为0

三、hadoop节点创建软连接

将libcephfs_jni.so添加到Hadoop目录中： [root@lqy1 native]# pwd /usr/hadoop/hadoop-2.7.3/lib/native [root@lqy1 native]# ln -s /usr/lib64/libcephfs_jni.so .

四、下载CephFS Hadoop 插件并修改相应配置

1、Hadoop需要一个CephFS插件的支持，可以在http://ceph.com/download/hadoop-cephfs.jar下载到。将插件放入每台hadoop节点中的/usr/share/java目录 2、修改hadoop的运行环境配置文件/usr/hadoop/hadoop-2.7.3/etc/hadoop/hadoop-env.sh export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} export HADOOP_CLASSPATH=/usr/share/java/libcephfs.jar:/usr/share/java/hadoop-cephfs.jar:$HADOOP_CLASSPATH

五、创建pool

在Ceph控制节点上创建Hadoop使用的池： ceph osd pool create hadoop1 64 ceph osd pool set hadoop1 size 1 （因为只有单节点集群，故size=1） ceph mds add_data_pool hadoop1

六、授权Hadoop节点

在Hadoop所有节点上创建一个文件夹保存Ceph配置和keyring，下面以/etc/ceph为例 [root@lqy1 java]# mkdir /etc/ceph [root@hadoop_ceph ceph]# scp ceph.conf ceph.client.admin.keyring root@lqy1:/etc/ceph/ ceph.conf 100% 226 0.2KB/s 00:00 ceph.client.admin.keyring 100% 63 0.1KB/s 00:00 [root@hadoop_ceph ceph]# scp ceph.conf ceph.client.admin.keyring root@lqy2:/etc/ceph/ ceph.conf 100% 226 0.2KB/s 00:00 ceph.client.admin.keyring

七、更改配置

在之前的hadoop环境下，停止集群。关闭hadoop集群，在hadoop主节点上 [root@lqy1 hadoop-2.7.3]# sbin/stop-all.sh 打开hadoop-2.7.3/etc/hadoop/core-site.xml，作如下更改： <configuration> <property> <name>fs.default.name</name> <value>ceph://192.168.0.30/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value> </property> <property> <name>ceph.conf.file</name> <value>/etc/ceph/ceph.conf</value> </property> <property> <name>ceph.auth.id</name> <value>admin</value> </property> <property> <name>ceph.auth.keyring</name> <value>/etc/ceph/ceph.client.admin.keyring</value> </property> <property> <name>ceph.data.pools</name> <value>hadoop1</value> </property> <property> <name>fs.ceph.impl</name> <value>org.apache.hadoop.fs.ceph.CephFileSystem</value> </property> </configuration> 同步到hadoop其他节点 [root@lqy1 hadoop]# scp hadoop-2.7.3/etc/hadoop/core-site.xml root@lqy2:/usr/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml

八、启动集群

[root@lqy1 hadoop-2.7.3]# sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured. Starting namenodes on [] lqy2: starting namenode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-lqy2.out lqy2: starting datanode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-lqy2.out Starting secondary namenodes [lqy1] lqy1: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-lqy1.out starting yarn daemons starting resourcemanager, logging to /usr/hadoop/hadoop-2.7.3/logs/yarn-root-resourcemanager-lqy1.out lqy2: starting nodemanager, logging to /usr/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-lqy2.out

九、查看集群状态

[root@lqy1 hadoop-2.7.3]# jps 3457 Jps 3323 ResourceManager [root@lqy2 hadoop]# jps 2249 NodeManager 2366 Jps

十、测试cephfs替换hdfs

[root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 结果为空传入文件 [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -put /etc/ceph/ceph.client.admin.keyring / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 16/08/15 01:07:06 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.30/ceph.client.admin.keyring._COPYING_ pool:repl=hadoop1:1 wanted=3 [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 结果显示传入文件 Found 1 items -rw-r--r-- 1 root 63 2016-08-15 01:07 /ceph.client.admin.keyring 在上述过程中，cephfs成功作为hadoop的后端存储使用。可以通过dfs对文件进行上传和下载等操作。

十一、下面进行一项模拟计算测试验证cephfs

创建文件夹 [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -mkdir /input DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 2 items -rw-r--r-- 1 root 63 2016-08-15 01:07 /ceph.client.admin.keyring drwxrwxrwx - root 0 2016-08-15 01:11 /input 复制文件从本地文件到分布式文件系统 input 下，然后进行计算测试 [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -put etc/hadoop/core-site.xml /input DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 16/08/15 01:13:29 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.30/input/core-site.xml._COPYING_ pool:repl=hadoop1:1 wanted=3 此时集群hadoop1池容量发生变化执行wordcount 计算单词数量运算。 [root@lqy1 hadoop-2.7.3]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output 16/08/15 01:16:08 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 16/08/15 01:16:08 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 16/08/15 01:16:08 INFO input.FileInputFormat: Total input paths to process : 1 16/08/15 01:16:08 INFO mapreduce.JobSubmitter: number of splits:1 16/08/15 01:16:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1016580221_0001 16/08/15 01:16:10 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/08/15 01:16:10 INFO mapreduce.Job: Running job: job_local1016580221_0001 16/08/15 01:16:10 INFO mapred.LocalJobRunner: OutputCommitter set in config null 16/08/15 01:16:10 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/08/15 01:16:10 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 16/08/15 01:16:10 INFO mapred.LocalJobRunner: Waiting for map tasks 16/08/15 01:16:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1016580221_0001_m_000000_0 16/08/15 01:16:10 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/08/15 01:16:10 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/08/15 01:16:10 INFO mapred.MapTask: Processing split: ceph://192.168.0.30/input/core-site.xml:0+1458 16/08/15 01:16:11 INFO mapreduce.Job: Job job_local1016580221_0001 running in uber mode : false 16/08/15 01:16:11 INFO mapreduce.Job: map 0% reduce 0% 16/08/15 01:16:11 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 16/08/15 01:16:11 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 16/08/15 01:16:11 INFO mapred.MapTask: soft limit at 83886080 16/08/15 01:16:11 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 16/08/15 01:16:11 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 16/08/15 01:16:11 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 16/08/15 01:16:12 INFO mapred.LocalJobRunner: 16/08/15 01:16:12 INFO mapred.MapTask: Starting flush of map output 16/08/15 01:16:12 INFO mapred.MapTask: Spilling map output 16/08/15 01:16:12 INFO mapred.MapTask: bufstart = 0; bufend = 1877; bufvoid = 104857600 16/08/15 01:16:12 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26213884(104855536); length = 513/6553600 16/08/15 01:16:12 INFO mapred.MapTask: Finished spill 0 16/08/15 01:16:12 INFO mapred.Task: Task:attempt_local1016580221_0001_m_000000_0 is done. And is in the process of committing 16/08/15 01:16:12 INFO mapred.LocalJobRunner: map 16/08/15 01:16:12 INFO mapred.Task: Task 'attempt_local1016580221_0001_m_000000_0' done. 16/08/15 01:16:12 INFO mapred.LocalJobRunner: Finishing task: attempt_local1016580221_0001_m_000000_0 16/08/15 01:16:12 INFO mapred.LocalJobRunner: map task executor complete. 16/08/15 01:16:12 INFO mapred.LocalJobRunner: Waiting for reduce tasks 16/08/15 01:16:12 INFO mapred.LocalJobRunner: Starting task: attempt_local1016580221_0001_r_000000_0 16/08/15 01:16:12 INFO mapreduce.Job: map 100% reduce 0% 16/08/15 01:16:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/08/15 01:16:12 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/08/15 01:16:12 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@27aa8b56 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10 16/08/15 01:16:12 INFO reduce.EventFetcher: attempt_local1016580221_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 16/08/15 01:16:12 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1016580221_0001_m_000000_0 decomp: 1691 len: 1695 to MEMORY 16/08/15 01:16:12 INFO reduce.InMemoryMapOutput: Read 1691 bytes from map-output for attempt_local1016580221_0001_m_000000_0 16/08/15 01:16:12 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 1691, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->1691 16/08/15 01:16:12 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 16/08/15 01:16:12 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 16/08/15 01:16:12 INFO mapred.Merger: Merging 1 sorted segments 16/08/15 01:16:12 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1685 bytes 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: Merged 1 segments, 1691 bytes to disk to satisfy reduce memory limit 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: Merging 1 files, 1695 bytes from disk 16/08/15 01:16:12 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 16/08/15 01:16:12 INFO mapred.Merger: Merging 1 sorted segments 16/08/15 01:16:12 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1685 bytes 16/08/15 01:16:12 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/08/15 01:16:12 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.30/output/_temporary/0/_temporary/attempt_local1016580221_0001_r_000000_0/part-r-00000 pool:repl=hadoop1:1 wanted=3 16/08/15 01:16:12 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 16/08/15 01:16:12 INFO mapred.Task: Task:attempt_local1016580221_0001_r_000000_0 is done. And is in the process of committing 16/08/15 01:16:12 INFO mapred.LocalJobRunner: 1 / 1 copied. 16/08/15 01:16:12 INFO mapred.Task: Task attempt_local1016580221_0001_r_000000_0 is allowed to commit now 16/08/15 01:16:12 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1016580221_0001_r_000000_0' to ceph://192.168.0.30/output/_temporary/0/task_local1016580221_0001_r_000000 16/08/15 01:16:12 INFO mapred.LocalJobRunner: reduce > reduce 16/08/15 01:16:12 INFO mapred.Task: Task 'attempt_local1016580221_0001_r_000000_0' done. 16/08/15 01:16:12 INFO mapred.LocalJobRunner: Finishing task: attempt_local1016580221_0001_r_000000_0 16/08/15 01:16:12 INFO mapred.LocalJobRunner: reduce task executor complete. 16/08/15 01:16:12 INFO ceph.CephFileSystem: selectDataPool path=ceph://192.168.0.30/output/_SUCCESS pool:repl=hadoop1:1 wanted=3 16/08/15 01:16:13 INFO mapreduce.Job: map 100% reduce 100% 16/08/15 01:16:13 INFO mapreduce.Job: Job job_local1016580221_0001 completed successfully 16/08/15 01:16:13 INFO mapreduce.Job: Counters: 35 File System Counters CEPH: Number of bytes read=0 省略。。。。。红色标记为hadoop导入文件系统map，reduce计算过程。生成的计算结果文件保存到分布式文件系统/ouput文件夹下 [root@lqy1 hadoop-2.7.3]# bin/hadoop dfs -ls /output DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 2 items -rw-r--r-- 1 root 0 2016-08-15 01:16 /output/_SUCCESS -rw-r--r-- 1 root 1305 2016-08-15 01:16 /output/part-r-00000 至此，hadoop2.7.3与ceph0.94结合成功。

posted on 2016-12-30 10:17 歪歪121 阅读(491) 评论(0) 收藏举报

刷新页面返回顶部

歪歪121