hadoop集群部署

首先说明下，整理的比较乱，遇到问题，解决问题即可

1. 需要确认部署的服务器ip

0 1 2 3 代表四个ip

另外需要一台服务器，做远程操控用

2. 在操控机上执行 ssh-keygen，生成本机秘钥文件（如果已经有，跳过本步骤），比如用户 test，秘钥文件路径为 /home/test/.ssh/

操控机上需要安装ansible

配置ansible安装源

wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo

安装ansible

1 yum -y install ansible

准备 ansible host清单文件： hosts

内容如下：

【hadoop_host】
0ip ansible_ssh_user=test
1ip ansible_ssh_user=test
2ip ansible_ssh_user=test 
3ip ansible_ssh_user=test

确保操控机到 hadoop集群服务的网路ok，至少ssh没问题

前提工作准备完成后，准备进行初始化工作，初始化工作包括（sudo无密码，远程允许sudo执行，ulimit 系统调优）

1.首先设置test用户执行sudo无密码操作
前提是test用户在wheel组
ansible -i ./hosts hadoop_host -m shell -a " sed 's/^# %wheel.*NOPASSWD: ALL/%wheel ALL=(ALL) NOPASSWD: ALL/' -i /etc/sudoers" -s --ask-sudo-pass-k 是要求输入密码选项
输入密码后，此命令的作用是test用在在远程服务器上可以sudo无密码
2. 操作 test用户可以远程执行sudo命令权限
ansible -i ./hosts hadoop_host -m shell -a " sed -i '/Defaults.*requiretty/a Defaults: test \!requiretty' /etc/sudoers" -s --ask-sudo-pass3. ulimit参数调整

ansible -i ./hosts hadoop_host -m shell -a " sed -i  '$ a fs.file-max = 65535'  /etc/sysctl.conf && sudo sed -i 's/1024/65535/' /etc/security/limits.d/90-nproc.conf && sudo sed -i '$ a * soft nofile 65535\\n* hard nofile 65535' /etc/security/limits.conf    " -s --ask-sudo-passk

接下来需要准备ssh无秘钥命令（从操控机到hadoop集群各服务器）

参考本随笔

http://www.cnblogs.com/jackchen001/p/6381270.html

安装jdk，并在hadoop集群配置java环境变量

 前提是，ssh无秘钥通道已打通
 1 生成jdk环境变量文件
 echo '
 export JAVA_HOME=/usr/java/latest/
 export PATH=$JAVA_HOME/bin:$PATH ' >> java.sh
 2 安装jdk
 ansible -i ./hosts hadoop_host -m yum -a "name=jdk state=present" -s
 3 传送jdk环境变量文件
 ansible -i ./hosts hadoop_host -m copy -a "src=java.sh dest=/etc/profile.d/" -s
 4 更改java安装目录属组权限
  ansible -i ./hosts hadoop_host -m shell -a "chown -R hadoop.hadoop /usr/java/" -s

查阅好的ansible模块介绍文章

http://breezey.blog.51cto.com/2400275/1555530/

hadoop集群hosts映射文件
生产hosts
echo '0 master
1 slave
2 slave
3 slave' >> /tmp/hosts
发送至hadoop集群服务器上
ansible -i ./hosts hadoop_host -m copy -a "src=/tmp/hosts dest=/etc/hosts" -s
更改hostname
ansible -i ./hosts hadoop_host -m shell -a "sed -i 's/.localdomain//g' /etc/sysconfig/network && service network restart " -s

下载hadoop，并配置

下载hadoop安装包
 ansible -i ./hosts hadoop_host -m  ger_url -a "url=http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz dest=/opt/" -s
这个下载命令执行的话，hadoop集群都会去下载hadoop安装包，造成网络资源的浪费
最好是在操控机上下载，并配置好，然后发送到hadoop集群服务器上
发送命令如下
ansible -i ./hosts hadoop_host -m copy -a "src=/opt/hadoop dest=/opt/ owner=hadoop user=hadoop mode=0755" -s

hadoop环境变量配置

生成hadoop环境变量文件
echo '
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/opt/hadoop/lib/native/"
export HADOOP_COMMON_LIB_NATIVE_DIR="/opt/hadoop/lib/native/"
' >> hadoop.sh

将hadoop环境文件发送至集群
ansible -i ./hosts hadoop_host -m copy -a "src=hadoo.sh dest=/etc/profiled./" -s

最重要的一件事，hadoop用需要在集群服务器之间可以无秘钥ssh，并执行命令

操控机上
创建hadoop用户，并设置hadoo用户密码
http://www.cnblogs.com/jackchen001/p/6381270.html
操控机上
ansible -i ./hosts hadoop_host -m shell -a "ssh-keygen -q" -s
等hadoop用户创建完，并设置密码，ssh无秘钥操作都做完后
需要在每台集群服务器上，执行 rsync_key playbook
以确保集群服务器之间hadoop有权限可以自由执行命令并ssh
让hadoop用户可以sudo
ansible -i ./hosts hadoop_host -m  shell -a "sed -i '$ a %hadoop ALL=(ALL) NOPASSWD: ALL ' /etc/sudoers" -s

做好这件事，然后就上传hadoop配置文件

http://hadoop.apache.org/docs/current/ hadoop官网

配置好的hadoop配置文件如下：

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/opt/hadoop/tmp</value>
</property>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:9000</value>
</property>
<property> 
  <name>dfs.name.dir</name>           
  <value>/opt/hadoop/name</value> 
</property>

</configuration>

hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>dfs.replication</name>  
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>  
    <value>file:/opt/hadoop/name1,/opt/hadoop/name2</value>
</property>

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/opt/hadoop/data1,/opt/hadoop/data2</value>
</property>

<property>
    <name>dfs.namnode.secondary.http-address</name>
    <value>slave1:9001</value>
</property>


</configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>

    <property>  
        <name>mapreduce.framework.name</name>  
        <value>yarn</value>  
    </property>

<property>
    <name>mapred.job.tracker</name>  
    <value>master:9001</value>
</property>
<property>
    <name>mapred.system.dir</name>  
    <value>/opt/hadoop/mapred_system</value>
</property>
<property>
    <name>mapred.local.dir</name>  
    <value>/opt/hadoop/mapred_local</value>
</property>

<property>
        <name>mapreduce.application.classpath</name>
        <value>
                /opt/hadoop/etc/hadoop,
                /opt/hadoop/lib/native/*,
                /opt/hadoop/share/hadoop/common/*,
                /opt/hadoop/share/hadoop/common/lib/*,
                /opt/hadoop/share/hadoop/hdfs/*,
                /opt/hadoop/share/hadoop/hdfs/lib/*,
                /opt/hadoop/share/hadoop/mapreduce/*,
                /opt/hadoop/share/hadoop/mapreduce/lib/*,
                /opt/hadoop/share/hadoop/yarn/*,
                /opt/hadoop/share/hadoop/yarn/lib/*
        </value>
</property>

</configuration>

yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

    <property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
    </property>  
    <property>  
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
        <value>org.apache.hadoop.mapred.ShuffleHandle</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.resource-tracker.address</name>  
        <value>master:8025</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.scheduler.address</name>  
        <value>master:8030</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.address</name>  
        <value>master:8040</value>  
    </property>
    <property>  
        <name>yarn.resourcemanager.admin.address</name>  
        <value>master:8033</value>  
    </property>
    <property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value>master:8034</value>  
    </property>

</configuration>

master文件内容
master
slaves文件内容
slave1
slave2
slave3
workers文件内容
slave1
slave2
slave3

hadoop-env.sh 文件中增加
export JAVA_HOME=/usr/java/latest

将以上配置文件在操控机上改好，然后传送至集群服务器上

总结了一半，自己懵了，我好尴尬啊！！！！！

posted @ 2017-02-14 09:36 jackchen007 阅读(270) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

jackchen007

hadoop集群部署

公告