代码改变世界

Vertica集群扩容实验过程记录

2016-08-09 22:33  AlfredZhao  阅读(...)  评论(... 编辑 收藏

需求:
将3个节点的Vertica集群扩容,额外增加3个节点,即扩展到6个节点的Vertica集群。

实验环境:
RHEL 6.5 + Vertica 7.2.2-2

步骤:

1.三节点Vertica集群环境创建

三节点IP地址和主机名规划:

192.168.56.121 vnode01
192.168.56.122 vnode02
192.168.56.123 vnode03

数据存储规划目录及所属用户/用户组:

mkdir -p /data/verticadb
chown -R dbadmin:verticadba /data/verticadb

这个3节点Vertica集群的安装过程不再赘述,综合参考我以前写过的几篇文章,你一定可以完美的搞定^_^。
FYI:
Linux快速配置集群ssh互信
Vertica 7.1安装最佳实践(RHEL6.4)
Vertica 安装,建库,新建测试用户并授予权限,建表,入库

Tips:7.2版本的安装提示依赖dialog这个包,如果系统没有预安装这个包,可以从对应系统光盘中找到这个包,直接rpm在各节点安装即可。如下:

[root@vnode01 Packages]# cluster_copy_all_nodes /root/dialog-1.1-9.20080819.1.el6.x86_64.rpm /root
dialog-1.1-9.20080819.1.el6.x86_64.rpm                                                                   100%  197KB 197.1KB/s   00:00    
dialog-1.1-9.20080819.1.el6.x86_64.rpm                                                                   100%  197KB 197.1KB/s   00:00    
[root@vnode01 Packages]# cluster_run_all_nodes "hostname; rpm -ivh /root/dialog-1.1-9.20080819.1.el6.x86_64.rpm"
vnode01
warning: /root/dialog-1.1-9.20080819.1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
Preparing...                ##################################################
dialog                      ##################################################
vnode02
warning: /root/dialog-1.1-9.20080819.1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
Preparing...                ##################################################
dialog                      ##################################################
vnode03
warning: /root/dialog-1.1-9.20080819.1.el6.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
Preparing...                ##################################################
dialog                      ##################################################
[root@vnode01 Packages]# 
[root@vnode01 Packages]# cluster_run_all_nodes "hostname; rpm -q dialog"
vnode01
dialog-1.1-9.20080819.1.el6.x86_64
vnode02
dialog-1.1-9.20080819.1.el6.x86_64
vnode03
dialog-1.1-9.20080819.1.el6.x86_64

最终安装完毕,集群状态应该是这样:

dbadmin=> select * from nodes;
     node_name      |      node_id      | node_state |  node_address  | node_address_family | export_address | export_address_family |                        catalog_path                        | node_type | is_ephemeral | standing_in_for | node_down_since 
--------------------+-------------------+------------+----------------+---------------------+----------------+-----------------------+------------------------------------------------------------+-----------+--------------+-----------------+-----------------
 v_testmpp_node0001 | 45035996273704982 | UP         | 192.168.56.121 | ipv4                | 192.168.56.121 | ipv4                  | /data/verticadb/TESTMPP/v_testmpp_node0001_catalog/Catalog | PERMANENT | f            |                 | 
 v_testmpp_node0002 | 45035996273721500 | UP         | 192.168.56.122 | ipv4                | 192.168.56.122 | ipv4                  | /data/verticadb/TESTMPP/v_testmpp_node0002_catalog/Catalog | PERMANENT | f            |                 | 
 v_testmpp_node0003 | 45035996273721504 | UP         | 192.168.56.123 | ipv4                | 192.168.56.123 | ipv4                  | /data/verticadb/TESTMPP/v_testmpp_node0003_catalog/Catalog | PERMANENT | f            |                 | 
(3 rows)

dbadmin=>

2.模拟创建业务最小测试用例

为了更好的模拟已经有业务在数据库上,我们来模拟创建业务最小测试用例:
FYI:

在参考Vertica 业务用户指定资源池加载数据这篇文章操作时,在GRANT目录读权限时遇到了一个错误,可能是版本差异,错误现象及解决方法如下:

--错误现象:
dbadmin=> CREATE LOCATION '/tmp' NODE 'v_testmpp_node0001' USAGE 'USER';
CREATE LOCATION
dbadmin=> GRANT READ ON LOCATION '/tmp' TO test;
ROLLBACK 5365:  User available location ["/tmp"] does not exist on node ["v_testmpp_node0002"]
dbadmin=> 

--解决:删除刚创建的节点1上的location,然后重新CREATE LOCATION,这一次指定参数“ALL NODES”:
dbadmin=> SELECT DROP_LOCATION('/tmp' , 'v_testmpp_node0001');
 DROP_LOCATION 
---------------
 /tmp dropped.
(1 row)

dbadmin=> CREATE LOCATION '/tmp' ALL NODES USAGE 'USER';
CREATE LOCATION
dbadmin=> GRANT READ ON LOCATION '/tmp' TO test;
GRANT PRIVILEGE

3.集群扩容前准备

集群扩容前,需要配置好增加的各个节点。

3.1 确认规划的IP地址和主机名,数据存储目录

IP地址和主机名规划:

192.168.56.124 vnode04
192.168.56.125 vnode05
192.168.56.126 vnode06

数据存储规划目录及所属用户/用户组:

mkdir -p /data/verticadb
--更改目录所有者,所有组,这里不用-R,因为已安装的节点该目录下会有大量子目录
chown dbadmin:verticadba /data/verticadb

3.2 root用户互信配置

--清除root用户ssh互信的当前所有配置信息(节点1执行)【因为root用户的互信删除不会影响到Vertica集群,所以才可以这样操作】
cluster_run_all_nodes "hostname ; rm -rf ~/.ssh"
rm -rf ~/.ssh

--节点1的hosts文件(vi /etc/hosts)
192.168.56.121 vnode01
192.168.56.122 vnode02
192.168.56.123 vnode03
192.168.56.124 vnode04
192.168.56.125 vnode05
192.168.56.126 vnode06

--节点1的环境变量(vi ~/.bash_profile)
export NODE_LIST='vnode01 vnode02 vnode03 vnode04 vnode05 vnode06'
--重新登录或source生效变量
source ~/.bash_profile

然后依据Linux快速配置集群ssh互信重新配置root用户的互信。

3.3 数据存储规划目录统一

cluster_run_all_nodes "hostname; mkdir -p /data/verticadb"

3.4 确认所有节点防火墙和SELinux关闭

cluster_run_all_nodes "hostname; service iptables status"
cluster_run_all_nodes "hostname; getenforce"

3.5 确认依赖包dialog已安装

cluster_run_all_nodes "hostname; rpm -q dialog"

4.集群扩容:增加3个节点到集群

4.1 增加3个节点到集群

/opt/vertica/sbin/update_vertica --add-hosts host(s) --rpm package

实际我这里是增加3个节点,指定这三个节点的主机名称

/opt/vertica/sbin/update_vertica --add-hosts vnode04,vnode05,vnode06 --rpm /root/vertica-7.2.2-2.x86_64.RHEL6.rpm --failure-threshold=HALT -u dbadmin -p vertica

执行过程如下:

[root@vnode01 ~]# /opt/vertica/sbin/update_vertica --add-hosts vnode04,vnode05,vnode06 --rpm /root/vertica-7.2.2-2.x86_64.RHEL6.rpm --failure-threshold=HALT -u dbadmin -p vertica
Vertica Analytic Database 7.2.2-2 Installation Tool


>> Validating options...


Mapping hostnames in --add-hosts (-A) to addresses...
        vnode04                        => 192.168.56.124
        vnode05                        => 192.168.56.125
        vnode06                        => 192.168.56.126

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

Default shell on nodes:
192.168.56.126 /bin/bash
192.168.56.125 /bin/bash
192.168.56.124 /bin/bash
192.168.56.123 /bin/bash
192.168.56.122 /bin/bash
192.168.56.121 /bin/bash

>> Validating software versions (rpm or deb)...


>> Beginning new cluster creation...

successfully backed up admintools.conf on 192.168.56.123 
successfully backed up admintools.conf on 192.168.56.122 
successfully backed up admintools.conf on 192.168.56.121 

>> Creating or validating DB Admin user/group...

Successful on hosts (6): 192.168.56.126 192.168.56.125 192.168.56.124 192.168.56.123 192.168.56.122 192.168.56.121
    Provided DB Admin account details: user = dbadmin, group = verticadba, home = /home/dbadmin
    Creating group... Group already exists
    Validating group... Okay
    Creating user... User already exists
    Validating user... Okay


>> Validating node and cluster prerequisites...

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.126.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.123.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.121.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.122.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.125.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

Prerequisites not fully met during local (OS) configuration for
verify-192.168.56.124.xml:
    HINT (S0151): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0151
        These disks do not have known IO schedulers: '/dev/mapper/vg_linuxbase-
        lv_root' ('') = ''
    HINT (S0305): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    WARN (S0170): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0170
        lsblk (LVM utility) indicates LVM on the data directory.
    FAIL (S0020): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0020
        Readahead size of  (/dev/mapper/vg_linuxbase-lv_root) is too low for
        typical systems: 256 < 2048
    FAIL (S0030): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0030
        ntp daemon process is not running: ['ntpd', 'ntp', 'chronyd']
    FAIL (S0310): https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=S0310
        Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

System prerequisites passed.  Threshold = HALT


>> Establishing DB Admin SSH connectivity...

Installing/Repairing SSH keys for dbadmin


>> Setting up each node and modifying cluster...

Creating Vertica Data Directory...

Updating agent...
Creating node node0004 definition for host 192.168.56.124
... Done
Creating node node0005 definition for host 192.168.56.125
... Done
Creating node node0006 definition for host 192.168.56.126
... Done

>> Sending new cluster configuration to all nodes...

Starting agent...

>> Completing installation...

Running upgrade logic
No spread upgrade required: /opt/vertica/config/vspread.conf not found on any node
Installation complete.

Please evaluate your hardware using Vertica's validation tools:
    https://my.vertica.com/docs/7.2.x/HTML/index.htm#cshid=VALSCRIPT

To create a database:
  1. Logout and login as dbadmin. (see note below)
  2. Run /opt/vertica/bin/adminTools as dbadmin
  3. Select Create Database from the Configuration Menu

  Note: Installation may have made configuration changes to dbadmin
  that do not take effect until the next session (logout and login).

To add or remove hosts, select Cluster Management from the Advanced Menu.

4.2 需要更改数据存储目录的所有者,所有组

--安装软件之后需要更改目录所有者,所有组,这里不用-R,因为已安装的节点该目录下会有大量子目录
cluster_run_all_nodes "hostname; chown dbadmin:verticadba /data/verticadb"

4.3 数据库填加集群中刚刚扩容的3个节点

dbadmin用户登录,使用admintools工具添加节点:

7  Advanced Menu -> 6  Cluster Management -> 1  Add Host(s) -> Select database 空格选择数据库 -> Select host(s) to add to database 空格选择要添加的节点
-> Are you sure you want to add ['192.168.56.124', '192.168.56.125', '192.168.56.126'] to the database?
-> 
 Failed to add nodes to database                                                                                                         |
|  ROLLBACK 2382:  Cannot create another node. The current license permits 3 node(s) and the database catalog already contains 3 node(s) 

这是因为社区版Vertica最多只允许有3个节点。
如果购买了HP官方的Vertica的正式授权或是临时授权,则可以导入授权,再添加新的集群节点到数据库。
如果有正式授权就会继续提示:

-> Successfully added nodes to the database.
-> Enter directory for Database Designer output:
输入/data/verticadb
-> Database Designer - Proposed K-safety value: 1
-> 
  +--------------------------------------------------------------------------------------------------------------------------+
       | The Database Designer is ready to modify your projections in order to re-balance data across all nodes in the database.  |
       |                                                                                                                          |
       | Review the options you selected:                                                                                         |
       |                                                                                                                          |
       | -The data will be automatically re-balanced with a k-safety value of 1.                                                  |
       |                                                                                                                          |
       | Rebalance will take place using elastic cluster methodology.                                                             |
       |                                                                                                                          |
       | Re-balancing data could take a long time; allow it to complete uninterrupted.                                            |
       | Use Ctrl+C if you must cancel the session.                                                                               |
       |                                                                                                                          |
       | To change any of the options press <Cancel> to return to the Cluster Management menu.                                    |
       |                                                                                                                          |
       |                                                                                                                          |
       +--------------------------------------------------------------------------------------------------------------------------+
       |                                         <Proceed>                     <Cancel >                
-> 选择 Proceed
-> 
Starting Data Rebalancing tasks.  Please wait....
This process could take a long time; allow it to complete uninterrupted.
Use Ctrl+C if you must cancel the session. 

等同步完成,

        Data Rebalance completed successfully.
Press <Enter> to return to the Administration Tools menu.

此时Vertica集群扩容就算全部完成了。

Reference