前置条件

  • 各软件版本:hadoop-2.7.7、hbase-2.1.5 、jdk1.8.0_211、zookeeper-3.4.10、apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
  • 至少 3 台 Centos 服务器,主机名分别为:hadoop0001、hadoop0002、hadoop0003
  • 这里所有的软件将安装在 hadoop 用户的 /home/hadoop/app 目录下
  • 在每台服务器设置 hosts
[hadoop@hadoop0001 ~]$ vim /etc/hosts

host 内容如下:

# 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.1.102  hadoop0001
10.2.1.103  hadoop0002
10.2.1.104  hadoop0003
  • ssh 免密登录(此步骤可以忽略,但 Hadoop 每次启动都需要输入密码)

在 hadoop0001 终端执行以下命令:

[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys

在 hadoop0002 终端执行以下命令:

[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys

在 hadoop0003 终端执行以下命令:

[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys

验证免密登录

[hadoop@hadoop0001 ~]$ ssh localhost
Last login: Fri Jan  4 13:45:54 2019 //出现这个结果表示免密登录成功

JDK 环境变量配置:

# 用户家目录下
[hadoop@hadoop0001 ~]$ vim .bashrc

添加以下内容:

JAVA_HOME=/home/hadoop/app/jdk1.8.0_192
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar 
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH

最后使环境变量生效:

# 用户家目录下
[hadoop@hadoop0001 ~]$ . .bashrc

JDK 验证:

java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version

将 hadoop0001 的 JDK 复制到其他服务器上

[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0002:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0003:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0002:/etc/profile
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0003:/etc/profile
  • NTP 服务搭建
    每台服务器上安装 ntp
[hadoop@hadoop0001 ~]$ yum install -y ntp

hadoop0001 配置 ntp

[hadoop@hadoop0001 ~]$ vim /etc/ntp.conf

添加以下配置:

restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
logfile /var/log/ntpd.log
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10

完整配置文件(ntp.conf):

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

logfile /var/log/ntpd.log

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com

server 127.0.0.1
fudge 127.0.0.1 stratum 10

#broadcast 192.168.1.255 autokey        # broadcast server
#broadcastclient                        # broadcast client
#broadcast 224.0.1.1 autokey            # multicast server
#multicastclient 224.0.1.1              # multicast client
#manycastserver 239.255.254.254         # manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography. 
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats

# Disable the monitoring facility to prevent amplification attacks using ntpdc
# monlist command when default restrict does not include the noquery flag. See
# CVE-2013-5211 for more details.
# Note: Monitoring will not be disabled with the limited restriction flag.
disable monitor

时间服务器可参考:https://www.pool.ntp.org/zone/asia

时间同步:

[hadoop@hadoop0001 ~]$ sudo ntpdate -u ntp1.aliyun.com
16 Jul 16:46:39 ntpdate[12700]: adjust time server 120.25.115.20 offset -0.002546 sec

启动时间服务:

[hadoop@hadoop0001 ~]$ sudo systemctl start ntpd

时间服务开机自启:

[hadoop@hadoop0001 ~]$ sudo systemctl enable ntpd

在 hadoop0002 和 hadoop0003 配置 ntp 客户端
在 /etc/ntp.conf 配置如下代码

server hadoop0001

查看 ntp 是否同步
如下表示未同步

[root@hadoop0002 ~]# ntpstat 
unsynchronised
  time server re-starting
   polling server every 8 s

如下表示已同步

[root@hadoop0001 ~]# ntpstat
synchronised to NTP server (120.25.115.20) at stratum 3 
   time correct to within 976 ms
   polling server every 64 s

注意:同步需要 10 分钟左右

Hadoop 安装

下载 Hadoop

wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

解压 Hadoop

tar -zxvf hadoop-2.7.7.tar.gz

配置 hadoop-env.sh

# 根据实际业务需要配置
export HADOOP_HEAPSIZE=1024

配置 mapred-env.sh

export JAVA_HOME=${JAVA_HOME}

配置 yarn-env.sh

# 根据实际业务需要配置
JAVA_HEAP_MAX=-Xmx512m
YARN_HEAPSIZE=1024

配置 core-site.xml

<!-- hdfs 端口 -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop0001:8020</value>
  </property>
  <!-- hadoop 临时数据目录 -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/application/hadoop-2.7.7/data</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>14400</value>
  </property>

配置 yarn-site.xml

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop0001</value>
    <discription>指定 YARN 的 ResourceManager 的地址</discription>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    <discription>日志聚集功能</discription>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <discription>Reducer 获取数据方式</discription>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    <discription>日志保留时间设置 7 天</discription>
  </property>

  <property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
  </property>

  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>15000</value>
    <discription>每个节点可用内存,单位MB</discription>
  </property>

  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>100</value>
    <discription>单个任务可申请最少内存,默认1024MB</discription>
  </property>

  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>15000</value>
    <discription>单个任务可申请最大内存,默认8192MB</discription>
  </property>

  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
    <discription>NodeManager总的可用虚拟CPU个数</discription>
  </property>

  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <discription>单个可申请的最小。比如设置为1,则运行MapRedce作业时,每个Task最少可申请1个虚拟CPU</discription>
  </property>

  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>4</value>
    <discription>单个可申请的最大虚拟CPU个数。比如设置为4,则运行MapRedce作业时,最多可申请4个虚拟CPU</discription>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>

  <property>
    <name>yarn.scheduler.fair.preemption</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.scheduler.fair.preemption.cluster-utilization-threshold</name>
    <value>0.8</value>
  </property>

配置 hdfs-site.xml

<!-- hdfs 数据副本数目  -->
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>

  <!-- hdfs 存储 fsimage 的地方
         <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/application/hadoop-2.8.5/data/hdfs/name</value>
  </property>
  -->

  <!-- hdfs 数据存放 block 的地方
         <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/application/hadoop-2.8.5/data/hdfs/data</value>
  </property>
  -->

  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop0001:50090</value>
  </property>

  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop0001:50070</value>
  </property>

  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>

配置 mapred-site.xml

<!-- 历史服务器端地址 -->
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop0001:10020</value>
  </property>
  <!-- 历史服务器 web 端地址 -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop0001:19888</value>
  </property>
  <!-- 指定 MR 运行在 Yarn 上 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

配置 slaves (/home/hadoop/app/hadoop-2.7.7)

hadoop0001
hadoop0002
hadoop0003

配置 Hadoop 环境变量

在用户家目录下的 .bashrc

# added by Hadoop installer
export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.7
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib

使环境生效:

. .bashrc

将配置好的 hadoop 发送到其他服务器

[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0002:~/app/hadoop-2.7.7
[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0003:~/app/hadoop-2.7.7
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/

在主 master 初始化 namenode

hadoop namenode -format

启动 hadoop 集群

# mater 节点 出现 NameNode、SecondaryNameNode,其他机器上出现 DataNode 说明集群搭建成功
start-all.sh

停止集群

stop-all.sh

Zookeeper 分布式集群搭建

下载 Zookeeper

wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz

解压 Zookeeper

tar -zxvf zookeeper-3.4.10.tar.gz

配置 zoo.cfg

cp zoo_sample.cfg zoo.cfg
vim zoo.cfg

配置内容如下:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=20
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=10
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/root/app/zookeeper-3.4.10/data
dataLogDir=/root/app/zookeeper-3.4.10/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop0001:2888:3888
server.2=hadoop0002:2888:3888
server.3=hadoop0003:2888:3888

在 zookeeper 根目录下创建 data 和 logs 文件夹

mkdir data
mkdir logs

在 data 目录下创建 myid

vim myid

内容为:

1

配置 zookeeper 环境变量

在用户家目录下的 .bashrc

# added by zookeeper installer
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.10
export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib
export PATH=$PATH:$ZOOKEEPER_HOME/bin

将配置好的 zookeeper 发送到其他机器上

[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0002:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0003:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/

修改其他机器的 myid

将其他节点的 myid 修改为 2、3,保证每台机器的 myid 在集群内唯一

启动 zookeeper 服务

每台机器执行:

zkServer.sh start

查看 zookeeper 状态

zkServer.sh status

Hbase HA 分布式集群搭建

下载 hbase

wget http://mirror.bit.edu.cn/apache/hbase/2.1.5/hbase-2.1.5-bin.tar.gz

解压 hbase

tar -zxvf hbase-2.1.5-bin.tar.gz

配置 hbase-site.xml

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://hadoop0001:8020/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <!-- 0.98 后的新变动,之前版本没有.port,默认端口为 60000 -->
  <property>
    <name>hbase.master.port</name>
    <value>16000</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop0001,hadoop0002,hadoop0003</value>
  </property>

  <property>
    <name>hbase.regionserver.restart.on.zk.expire</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.coprocessor.abortonerror</name>
    <value>false</value>
  </property>

  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/root/app/zookeeper-3.4.10/data</value>
  </property>

  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
        Controls whether HBase will check for stream capabilities (hflush/hsyn    c).

        Disable this if you intend to run on LocalFileSystem, denoted by a roo    tdir
        with the 'file://' scheme, but be mindful of the NOTE below.

        WARNING: Setting this to false blinds you to potential data loss and
        inconsistent system state in the event of process and/or node failures    . If
        HBase is complaining of an inability to use hsync or hflush it's most
        likely not a false positive.
    </description>
  </property>

配置 regionservers

在 hbase 根目录下的 conf 目录下的 regionservers 文件加入如下配置:

# 主机名即 host
hadoop0001
hadoop0002
hadoop0003

配置 hbase 环境变量

在用户家目录下的 .bashrc

# added by hbase installer
export HBASE_HOME=/root/app/hbase-2.1.5/
export CLASSPATH=$CLASSPATH:$HBASE_HOME/lib
export PATH=$PATH:$HBASE_HOME/bin

将配置好的 hbase 发送到其他机器

[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0002:~/app/hbase-2.1.5
[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0003:~/app/hbase-2.1.5
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/

配置 backup-masters(备用 master 节点)

在 hbase 根目录下的 conf 目录下的 backup-masters文件加入如下配置:

# master 节点配置,可配置多个
hadoop0002

启动 hbse 集群

start-hbase.sh

注意:在主节点出现 HMaster、HRegionServer(有可能没有,属于正常)及备用节点 出现 HMaster、HRegionServer;其他节点出现 HRegionServer;说明Hbase集群搭建成功;

停止 hbase 集群

stop-hbase.sh

Phoenix 集群安装

下载 Phoenix

wget http://mirror.bit.edu.cn/apache/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz

解压 Phoenix

tar -zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz

复制以下 jar 包到所有节点的 Habse 根目录下的 lib 目录下

[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-queryserver.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/

[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-server.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/

[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-core-5.0.0-HBase-2.0.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/

配置 Phoenix 环境变量(无需复制到其他节点)

# added by phoenix installer
export PHOENIX_HOME=/root/app/apache-phoenix-5.0.0-HBase-2.0-bin
export CLASSPATH=$CLASSPATH:$PHOENIX_HOME
export PATH=$PATH:$PHOENIX_HOME/bin

启动 Phoenix queryserver 模式

queryserver.py start

停止 Phoenix queryserver 模式

queryserver.py stop

连接 Phoenix queryserver

sqlline-thin.py hadoop0001:8765

客户端 jdbc 连接(jdbcUrl)

jdbc:phoenix:thin:url=http://10.2.1.102:8765?doAs=alice


作者:etrols
链接:https://www.jianshu.com/p/093e748b42cb
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
 posted on 2019-11-14 17:00  xibuhaohao  阅读(717)  评论(0编辑  收藏  举报