PG分布式之PGXL安装部署-postgresql-xl-9.5

PG分布式之PGXL安装部署-postgresql-xl-9.5

介绍Postgres-XL

Postgres-XL 全称为 Postgres eXtensible Lattice，是TransLattice公司及其收购数据库技术公司–StormDB的产品。Postgres-XL是一个横向扩展的开源数据库集群，具有足够的灵活性来处理不同的数据库任务。

Postgres-XL是一款开源的PG集群软件，XL代表eXtensible Lattice，即可扩展的PG“格子”之意，以下简称PGXL。

官方称其既适合写操作压力较大的OLTP应用，又适合读操作为主的大数据应用。它的前身是Postgres-XC（简称PGXC），PGXC是在PG的基础上加入了集群功能，主要适用于OLTP应用。PGXL是在PGXC的基础上的升级产品，加入了一些适用于OLAP应用的特性，如 Massively Parallel Processing (MPP) 特性。

通俗的说PGXL的代码是包含PG代码，使用PGXL安装PG集群并不需要单独安装PG。这样带来的一个问题是无法随意选择任意版本的PG，好在PGXL跟进PG较及时，目前最新版本Postgres-XL 10R1，基于PG 10。

Postgres-XL功能特性

开放源代码：（源协议使用宽松的“Mozilla Public License”许可，允许将开源代码与闭源代码混在一起使用。）
完全的ACID支持
可横向扩展的关系型数据库（RDBMS）
- 支持OLAP应用，采用MPP（Massively Parallel Processing：大规模并行处理系统）架构模式
- 支持OLTP应用，读写性能可扩展
- 集群级别的ACID特性
- 多租户安全
  也可被用作分布式Key-Value存储
事务处理与数据分析处理混合型数据库
支持丰富的SQL语句类型，比如：关联子查询
支持绝大部分PostgreSQL的SQL语句
分布式多版本并发控制（MVCC：Multi-version Concurrency Control）
支持JSON和XML格式

Postgres-XL缺少的功能

内建的高可用机制
- 使用外部机制实现高可能，如：Corosync/Pacemaker
- 有未来功能提升的空间
增加节点/重新分片数据（re-shard）的简便性
- 数据重分布（redistribution）期间会锁表
- 可采用预分片（pre-shard）方式解决，在同台物理服务器上建立多个数据节点，每个节点存储一个数据分片。数据重分布时，将一些数据节点迁出即可
某些外键、唯一性约束功能

Postgres-XL架构

基于开源项目Postgres-XC
- XL增加了MPP，允许数据节点间直接通讯，交换复杂跨节点关联查询相关数据信息，减少协调器负载。
多个协调器（Coordinator）
- 应用程序的数据库连入点
- 分析查询语句，生成执行计划
多个数据节点（DataNode）
- 实际的数据存储
- 数据自动打散分布到集群中各数据节点
- 本地执行查询
- 一个查询在所有相关节点上并行查询
全局事务管理器（GTM：Global Transaction Manager）
- 提供事务间一致性视图
- 部署GTM Proxy实例，以提高性能

Postgre-XL主要组件

GTM (Global Transaction Manager) - 全局事务管理器
GTM是Postgres-XL的一个关键组件，用于提供一致的事务管理和元组可见性控制。
GTM Standby
GTM的备节点，在pgxc,pgxl中，GTM控制所有的全局事务分配，如果出现问题，就会导致整个集群不可用，为了增加可用性，增加该备用节点。当GTM出现问题时，GTM Standby可以升级为GTM，保证集群正常工作。
GTM-Proxy
GTM需要与所有的Coordinators通信，为了降低压力，可以在每个Coordinator机器上部署一个GTM-Proxy。
Coordinator --协调器
协调器是应用程序到数据库的接口。它的作用类似于传统的PostgreSQL后台进程，但是协调器不存储任何实际数据。实际数据由数据节点存储。协调器接收SQL语句，根据需要获取全局事务Id和全局快照，确定涉及哪些数据节点，并要求它们执行(部分)语句。当向数据节点发出语句时，它与GXID和全局快照相关联，以便多版本并发控制(MVCC)属性扩展到集群范围。
Datanode --数据节点
用于实际存储数据。表可以分布在各个数据节点之间，也可以复制到所有数据节点。数据节点没有整个数据库的全局视图，它只负责本地存储的数据。接下来，协调器将检查传入语句，并制定子计划。然后，根据需要将这些数据连同GXID和全局快照一起传输到涉及的每个数据节点。数据节点可以在不同的会话中接收来自各个协调器的请求。但是，由于每个事务都是惟一标识的，并且与一致的(全局)快照相关联，所以每个数据节点都可以在其事务和快照上下文中正确执行。

Postgres-XL继承了PostgreSQL

Postgres-XL是PostgreSQL的扩展并继承了其很多特性：

复杂查询
外键
触发器
视图
事务
MVCC(多版本控制)
此外，类似于PostgreSQL，用户可以通过多种方式扩展Postgres-XL，例如添加新的
数据类型
函数
操作
聚合函数
索引类型
过程语言

应用场景

Postgres-XL是一个水平可扩展的开源SQL数据库集群，足够灵活，可以处理不同的数据库工作负载：

OLTP 写频繁的业务
需要MPP并行性商业智能/大数据分析
操作数据存储
Key-value 存储
GIS的地理空间
混合业务工作环境
多租户服务提供商托管环境
Web 2.0

安装配置

集群规划

主机名	IP	角色	端口	nodename	数据目录
gtm	192.168.81.128	GTM	6666	gtm	/nodes/gtm
		GTM Slave	20001	gtmSlave	/nodes/gtmSlave
datanode1	192.168.81.129	Coordinator	5432	coord1	/nodes/coord
		Datanode	5433	node1	/nodes/dn_master
		Datanode Slave	15433	node1_slave	/nodes/dn_slave
		GTM Proxy	6666	gtm_pxy1	/nodes/gtm_pxy
datanode2	192.168.81.130	Coordinator	5432	coord2	/nodes/coord
		Datanode	5433	node2	nodes/dn_master
		Datanode Slave	15433	node2_slave	/nodes/dn_slave
		GTM Proxy	6666	gtm_pxy2	/nodes/gtm_pxy

hosts配置（三台都配置）

[root@gtm ~]# cat /etc/hosts
192.168.81.128 gtm
192.168.81.129 datanode1
192.168.81.130 datanode2

安装依赖包

# 三台服务器都操作
yum install -y flex bison readline-devel zlib-devel openjade docbook-style-dsssl gcc

防火墙selinux

# 三台服务器都操作
# 关闭防火墙（可选）
systemctl stop firewalld.service && systemctl disable firewalld.service && systemctl status firewalld.service
# 关闭selinux
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

同步系统时间

# 三台服务器都操作
ntpdate ntp1.aliyun.com

创建用户

# 三台服务器都操作
groupadd postgres
useradd postgres -g postgres
echo "postgres" | passwd --stdin postgres

gtm节点免密登陆配置

# 三台服务器都操作
su - postgres
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# gtm服务器都操作
将刚生成的认证文件拷贝到node2到node3中，使得gtm节点可以免密码登录node2~node3的任意一个节点：
scp ~/.ssh/authorized_keys postgres@datanode1:~/.ssh/
scp ~/.ssh/authorized_keys postgres@datanode2:~/.ssh/

系统资源限制配置

cat > /etc/sysctl.conf <<"EOF"
vm.swappiness=10
vm.zone_reclaim_mode=0
fs.aio-max-nr = 1048576
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586
kernel.shmmax = 1288490188
kernel.shmall = 314572
kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 20480
vm.nr_hugepages=16900
EOF
 
sysctl -p
 
-shmmax：表示单个共享内存段（Shared Memory Segment）可以创建的共享内存的最大值。通常设置成机器的总内存数就可以了。
-shmall的单位是页面，而shmmax的单位是字节。

软连接和文件数限制

cat >> /etc/security/limits.conf <<"EOF" 
* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072
* soft core unlimited
* hard core unlimited
* soft memlock 50000000
* hard memlock 50000000
EOF
 
echo "* - nproc unlimited" > /etc/security/limits.d/90-nproc.conf
echo "session required pam_limits.so" >> /etc/pam.d/login

关闭THP

root用户下
在vi /etc/rc.local最后添加如下代码

cat >> /etc/rc.local <<"EOF" 
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
EOF

chmod +x /etc/rc.d/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
 
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

配置大页hugepage

计算大页数
计划给 PG 用 48 GB 共享内存（≈64 GB 的 3/4）
hugepage_kb=2048                       # 2 MB
shared_mb=((48∗1024))nr=
((shared_mb*1024/hugepage_kb + 200))  # 多留 200 个给波动
echo "vm.nr_hugepages = $nr" >> /etc/sysctl.conf
sysctl -p

计划给 PostgreSQL 共享内存：32 GB
2 MB = 2048 KB
32 GB = 32 × 1024 × 1024 KB = 33 554 432 KB
所需大页数 = 33 554 432 ÷ 2048 = 16 384

vm.nr_hugepages = 16900

立刻生效（无需重启 OS）
sysctl -w vm.nr_hugepages=$nr

查看是否成功
grep HugePages_Total /proc/meminfo

调整系统 limit（postgres 用户要能 lock 内存）
cat >>/etc/security/limits.conf <<EOF
postgres soft memlock unlimited
postgres hard memlock unlimited
EOF
 
如使用 systemd，同步到 service 层
mkdir -p /etc/systemd/system/postgresql.service.d/
cat >/etc/systemd/system/postgresql.service.d/hugepage.conf <<EOF
[Service]
LimitMEMLOCK=infinity
EOF
systemctl daemon-reload

Postgres-XL安装

下载安装包https://www.postgres-xl.org/downloads/postgres-xl-9.5r1.6.tar.bz2

# 三台服务器都操作
# 安装解压缩工具
yum install lbzip2 bzip2 -y
tar xvf postgres-xl-9.5r1.6.tar.bz2
cd postgres-xl-9.5r1.6
./configure --prefix=/usr/local/pgxl/
make && make install
cd contrib/
make && make install

cortrib中有很多postgres很牛的工具，一般要装上。如ltree,uuid,postgres_fdw等等。

配置环境变量

# 三台服务器都操作
[postgres@gtm ~]$ su - postgres
[postgres@gtm ~]$ 
cat >>.bashrc<<"EOF"
export PGHOME=/usr/local/pgxl
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH
EOF
[postgres@gtm ~]$ source .bashrc

集群配置

生成pgxc_ctl配置文件

[postgres@gtm ~]# pgxc_ctl

PGXC prepare ---执行该命令将会生成一份配置文件模板
PGXC   ---按ctrl c退出。

配置pgxc_ctl.conf

在pgxc_ctl文件夹中存在一个pgxc_ctl.conf文件，编辑如下：

#!/usr/bin/env bash
pgxcInstallDir=$PGHOME
pgxlDATA=$PGHOME/data 
pgxcOwner=postgres          # owner of the Postgres-XC databaseo cluster.  Here, we use this
pgxcUser=$pgxcOwner        # OS user of Postgres-XC owner
tmpDir=/tmp                 # temporary dir used in XC servers
localTmpDir=$tmpDir            # temporary dir used here locally
configBackup=n                  # If you want config file backup, specify y to this value.
configBackupHost=pgxc-linker    # host to backup config file
configBackupDir=$HOME/pgxc     # Backup directory
configBackupFile=pgxc_ctl.bak   # Backup file name --> Need to synchronize when original changed.

#---- GTM Master -----------------------------------------------
gtmName=gtm
gtmMasterServer=gtm
gtmMasterPort=6666
gtmMasterDir=$pgxlDATA/nodes/gtm

#---- Configuration ---
gtmExtraConfig=none         # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none   # Will be added to Master's gtm.conf (done at initialization only)

#---- GTM Slave -----------------------------------------------
gtmSlave=y                  # Specify y if you configure GTM Slave.   Otherwise, GTM slave will not be configured and
gtmSlaveName=gtmSlave
gtmSlaveServer=gtm      # value none means GTM slave is not available.  Give none if you don't configure GTM Slave.
gtmSlavePort=20001          # Not used if you don't configure GTM slave.
gtmSlaveDir=$pgxlDATA/nodes/gtmSlave   # Not used if you don't configure GTM slave.

#---- Configuration ----
gtmSlaveSpecificExtraConfig=none # Will be added to Slave's gtm.conf (done at initialization only)

#---- GTM Proxy -------------------------------------------------------------------------------------------------------
gtmProxyDir=$pgxlDATA/nodes/gtm_pxy
#---- Overall -------
gtmProxy=y              # Specify y if you conifugre at least one GTM proxy.   You may not configure gtm proxies
gtmProxyNames=(gtm_pxy1 gtm_pxy2)   # No used if it is not configured
gtmProxyServers=(datanode1 datanode2)           # Specify none if you dont' configure it.
gtmProxyPorts=(6666 6666)               # Not used if it is not configured.
gtmProxyDirs=($gtmProxyDir $gtmProxyDir) # Not used if it is not configured.

#---- Configuration ----
gtmPxyExtraConfig=none      # Extra configuration parameter for gtm_proxy.  Coordinator section has an example.
gtmPxySpecificExtraConfig=(none none none none)

#---- Coordinators ----------------------------------------------------------------------------------------------------
coordMasterDir=$pgxlDATA/nodes/coord
coordSlaveDir=$HOME/pgxc/nodes/coord_slave
coordArchLogDir=$HOME/pgxc/nodes/coord_archlog

#---- Overall ------------
coordNames=(coord1 coord2)      # Master and slave use the same name
coordPorts=(5432 5432)          # Master ports
poolerPorts=(6667 6667)         # Master pooler ports
coordPgHbaEntries=(0.0.0.0/0)               # Assumes that all the coordinator (master/slave) accepts

#---- Master -------------
coordMasterServers=(datanode1 datanode2)        # none means this master is not available
coordMasterDirs=($coordMasterDir $coordMasterDir)
coordMaxWALsernder=0    # max_wal_senders: needed to configure slave. If zero value is specified,
coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder)

#---- Slave -------------
coordSlave=n            # Specify y if you configure at least one coordiantor slave.  Otherwise, the following
coordSlaveSync=y        # Specify to connect with synchronized mode.
coordSlaveServers=(node07 node08 node09 node06)         # none means this slave is not available
coordSlavePorts=(20004 20005 20004 20005)           # Master ports
coordSlavePoolerPorts=(20010 20011 20010 20011)         # Master pooler ports
coordSlaveDirs=($coordSlaveDir $coordSlaveDir $coordSlaveDir $coordSlaveDir)
coordArchLogDirs=($coordArchLogDir $coordArchLogDir $coordArchLogDir $coordArchLogDir)

#---- Configuration files---
coordExtraConfig=coordExtraConfig   # Extra configuration file for coordinators.  
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 100
EOF

coordSpecificExtraConfig=(none none none none)
coordExtraPgHba=none    # Extra entry for pg_hba.conf.  This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none none none)
coordAdditionalSlaves=n     # Additional slave can be specified as follows: where you
coordAdditionalSlaveSet=(cad1)      # Each specifies set of slaves.   This case, two set of slaves are
                                            # configured
cad1_Sync=n             # All the slaves at "cad1" are connected with asynchronous mode.
                            # If not, specify "y"
                            # The following lines specifies detailed configuration for each
                            # slave tag, cad1.  You can define cad2 similarly.
cad1_Servers=(node08 node09 node06 node07)  # Hosts
cad1_dir=$HOME/pgxc/nodes/coord_slave_cad1
cad1_Dirs=($cad1_dir $cad1_dir $cad1_dir $cad1_dir)
cad1_ArchLogDir=$HOME/pgxc/nodes/coord_archlog_cad1
cad1_ArchLogDirs=($cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir)
#---- Datanodes -------------------------------------------------------------------------------------------------------
#---- Shortcuts --------------
datanodeMasterDir=$pgxlDATA/nodes/dn_master
datanodeSlaveDir=$HOME/pgxc/nodes/dn_slave
datanodeArchLogDir=$HOME/pgxc/nodes/datanode_archlog

primaryDatanode=node1               # Primary Node.
datanodeNames=(node1 node2)
datanodePorts=(5433 5433)   # Master ports
datanodePoolerPorts=(6668 6668) # Master pooler ports
datanodePgHbaEntries=(0.0.0.0/0)    # Assumes that all the coordinator (master/slave) accepts
#---- Master ----------------
datanodeMasterServers=(datanode1 datanode2) # none means this master is not available.
datanodeMasterDirs=($datanodeMasterDir $datanodeMasterDir)
datanodeMaxWalSender=4                              # max_wal_senders: needed to configure slave. If zero value is 
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)

#---- Slave -----------------
datanodeSlave=n         # Specify y if you configure at least one coordiantor slave.  Otherwise, the following
datanodeSlaveServers=(datanode2 datanode1)  # value none means this slave is not available
datanodeSlavePorts=(15433 15433)    # value none means this slave is not available
datanodeSlavePoolerPorts=(20012 20012)  # value none means this slave is not available
datanodeSlaveSync=y     # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=($datanodeArchLogDir $datanodeArchLogDir )

# ---- Configuration files ---
datanodeExtraConfig=none    # Extra configuration file for datanodes.  This file will be added to all the 
datanodeSpecificExtraConfig=(none none none none)
datanodeExtraPgHba=none     # Extra entry for pg_hba.conf.  This file will be added to all the datanodes' postgresql.conf
datanodeSpecificExtraPgHba=(none none none none)
#----- Additional Slaves -----
datanodeAdditionalSlaves=n  # Additional slave can be specified as follows: where you
#---- WAL archives -------------------------------------------------------------------------------------------------
walArchive=n    # If you'd like to configure WAL archive, edit this section.
walArchiveSet=(war1 war2)
war1_source=(master)    # you can specify master, slave or ano other additional slaves as a source of WAL archive.
                    # Default is the master
wal1_source=(slave)
wal1_source=(additiona_coordinator_slave_set additional_datanode_slave_set)
war1_host=node10    # All the nodes are backed up at the same host for a given archive set
war1_backupdir=$HOME/pgxc/backup_war1
wal2_source=(master)
war2_host=node11
war2_backupdir=$HOME/pgxc/backup_war2

集群初始化

第一次启动集群，需要初始化，初始化如下：

# 三台服务器赋权
# gtm主机操作
[postgres@gtm pgxc_ctl]$ pgxc_ctl -c /postgresxl/bin/pgxc_ctl.conf init all
初始化后会直接启动集群

集群验证和管理

集群验证

在datanode1节点上连接上5432端口，即node1上的coordinator，查看集群节点状态

[postgres@datanode1 ~]$ psql -p 5432
psql (PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6))
Type "help" for help.

postgres=# select * from pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |      5432 | datanode1 | f              | f                |  1885696643
 coord2    | C         |      5432 | datanode2 | f              | f                | -1197102633
 node1     | D         |      5433 | datanode1 | t              | t                |  1148549230
 node2     | D         |      5433 | datanode2 | f              | f                |  -927910690
(4 rows)

如上面所示，可看到2个coordinator，2个node。集群状态正常尝试创建测试表test1，并插入数据

postgres=# create table test1(id int,name text);
CREATE TABLE
postgres=# insert into test1(id,name) select generate_series(1,8),'test';
INSERT 0 8
postgres=# select * from test1;
 id | name 
----+------
  1 | test
  2 | test
  5 | test
  6 | test
  8 | test
  3 | test
  4 | test
  7 | test
(8 rows)

连接到datanode1服务器的5433端口，即node1，查看测试表的数据

[postgres@datanode1 ~]$ psql -p 5433
psql (PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6))
Type "help" for help.

postgres=# select * from test1;
 id | name 
----+------
  1 | test
  2 | test
  5 | test
  6 | test
  8 | test
(5 rows)

可看到表中数据是不全的再去datanode2服务器查看先连接到5432端口，即coordinator

[postgres@datanode2 ~]$ psql -p 5432
psql (PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6))
Type "help" for help.

postgres=# select * from test1;
 id | name 
----+------
  1 | test
  2 | test
  5 | test
  6 | test
  8 | test
  3 | test
  4 | test
  7 | test
(8 rows)

再切换到5433端口，即node2

[postgres@datanode2 ~]$ psql -p 5433
psql (PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6))
Type "help" for help.

postgres=# select * from test1;
 id | name 
----+------
  3 | test
  4 | test
  7 | test
(3 rows)

正好是node1中缺少的那部分数据，证实分布式数据库集群搭建完成。

集群操作

清理集群

pgxc_ctl -c /postgresxl/bin/pgxc_ctl.conf clean all

启动

pgxc_ctl -c /postgresxl/bin/pgxc_ctl.conf start all

关闭

pgxc_ctl -c /postgresxl/bin/pgxc_ctl.conf stop all

常用命令

目的	命令
一次性启动整个集群	`start all`
一次性停止整个集群	`stop all`
只启/停某类节点	`start coordinator` / `stop datanode`
查看所有节点存活状态	`monitor all`
查看指定节点	`monitor gtm master gtm1`
重新加载配置（不中断业务）	`reload all`

在线扩容（新增 Coordinator / Datanode）

步骤	命令示例
1. 新增 Coordinator	`add coordinator master coord3 10.4.7.103 5432 5433 /nodes/coord3 none none`
2. 新增 Datanode 主节点	`add datanode master dn3 10.4.7.103 5434 5435 /nodes/dn3 none none none`
3. 为新 Datanode 建备节点	`add datanode slave dn3_slave 10.4.7.104 5434 5435 /nodes/dn3_slave dn3 none`
4. 数据重分布（可选）	在任意 Coordinator 执行 `ALTER TABLE xxx ADD NODE (dn3);` 触发重分布

在线缩容（移除节点）

步骤	命令示例
1. 迁空数据	`ALTER TABLE xxx DELETE NODE (dn3);` 等待 `pgxc_clean` 完成
2. 下线 Datanode	`remove datanode master dn3`
3. 下线 Coordinator	`remove coordinator master coord3`

元数据与拓扑查询

需求	SQL 语句
查看当前集群拓扑	`SELECT * FROM pgxc_node ORDER BY node_type, node_name;`
查看表分布方式/节点	`SELECT * FROM pgxc_class;`
查看本地表大小（所有节点）	`SELECT node_name, pg_size_pretty(pg_relation_size('public.tbl')) FROM pgxc_node;`

备份 & 恢复

需求	命令
全量一致性备份（所有 Datanode）	`pg_dump -p 5432 -Fc -f /backup/full.dmp dbname`
单节点物理备份	`pg_basebackup -h dn1 -p 5433 -D /backup/dn1 -Fp -Xs -P`
节点级增量恢复	停止目标节点 → 清空数据目录 → `pg_basebackup` 重新拉取 → `start datanode`

GTM 主从切换（手动）

步骤	命令
1. 确保 GTM-Slave 同步	`monitor gtm slave gtmSlave`
2. 触发在线切换	`failover gtm gtmSlave`
3. 原主降级为新备	`add gtm slave oldGtm`

常用运维脚本

若集群使用 Patroni 做自动故障切换，可并行使用：

目的	命令
查看集群角色/延迟	`patronictl -c /etc/patroni/pg.yml list`
手动主备切换	`patronictl -c /etc/patroni/pg.yml switchover`
滚动重启所有只读节点	`patronictl -c /etc/patroni/pg.yml restart cluster -r replica --force`
动态重载参数	`patronictl -c /etc/patroni/pg.yml reload cluster`

建表说明

REPLICATION表：各个datanode节点中，表的数据完全相同，也就是说，插入数据时，会分别在每个datanode节点插入相同数据。读数据时，只需要读任意一个datanode节点上的数据。
建表语法：

postgres=#  CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY REPLICATION;

DISTRIBUTE ：会将插入的数据，按照拆分规则，分配到不同的datanode节点中存储，也就是sharding技术。每个datanode节点只保存了部分数据，通过coordinate节点可以查询完整的数据视图。

postgres=#  CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1);

模拟部分数据，插入测试数据：

#任意登录一个coordinate节点进行建表操作
[postgres@gtm ~]$  psql -h  datanode1 -p 5432 -U postgres
postgres=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo';
INSERT 0 100
postgres=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200);
INSERT 0 100

查看数据分布结果：

#DISTRIBUTE表分布结果
postgres=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 1148549230 |    42
 -927910690 |    58
(2 rows)
#REPLICATION表分布结果
postgres=# SELECT count(*) FROM repltab;
 count 
-------
   100
(1 row)

查看另一个datanode2中repltab表结果

[postgres@datanode2 pgxl9.5]$ psql -p 5433
psql (PGXL 9.5r1.3, based on PG 9.5.4 (Postgres-XL 9.5r1.3))
Type "help" for help.

postgres=# SELECT count(*) FROM repltab;
 count 
-------
   100
(1 row)

结论：REPLICATION表中，datanode1,datanode2中表是全部数据，一模一样。而DISTRIBUTE表，数据散落近乎平均分配到了datanode1,datanode2节点中。

新增datanode节点与数据重分布

新增datanode节点

在gtm集群管理节点上执行pgxc_ctl命令

[postgres@gtm ~]$ pgxc_ctl
/bin/bash
Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash.
Installing pgxc_ctl_bash script as /home/postgres/pgxc_ctl/pgxc_ctl_bash.
Reading configuration using /home/postgres/pgxc_ctl/pgxc_ctl_bash --home /home/postgres/pgxc_ctl --configuration /home/postgres/pgxc_ctl/pgxc_ctl.conf
Finished reading configuration.
   ******** PGXC_CTL START ***************

Current directory: /home/postgres/pgxc_ctl
PGXC

在PGXC后面执行新增数据节点命令:

Current directory: /home/postgres/pgxc_ctl
# 在服务器datanode1上，新增一个master角色的datanode节点，名称是dn3
# 端口号暂定5430，pool master暂定6669 ，指定好数据目录位置，从两个节点升级到3个节点，之后要写3个none
# none应该是datanodeSpecificExtraConfig或者datanodeSpecificExtraPgHba配置
PGXC add datanode master dn3 datanode1 5430 6669 /home/postgres/pgxl9.5/data/nodes/dn_master3 none none none

等待新增完成后，查询集群节点状态：

[postgres@gtm ~]$ psql -h datanode1 -p 5432 -U postgres
psql (PGXL 9.5r1.3, based on PG 9.5.4 (Postgres-XL 9.5r1.3))
Type "help" for help.

postgres=# select * from pgxc_node;
 node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-----------+-----------+-----------+-----------+----------------+------------------+-------------
 coord1    | C         |      5432 | datanode1 | f              | f                |  1885696643
 coord2    | C         |      5432 | datanode2 | f              | f                | -1197102633
 node1     | D         |      5433 | datanode1 | f              | t                |  1148549230
 node2     | D         |      5433 | datanode2 | f              | f                |  -927910690
 dn3       | D         |      5430 | datanode1 | f              | f                |  -700122826
(5 rows)

可以发现节点新增完毕。

数据重分布

之前我们的DISTRIBUTE表分布在了node1,node2节点上，如下：

postgres=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 1148549230 |    42
 -927910690 |    58
(2 rows)

新增一个节点后，将sharding表数据重新分配到三个节点上，将repl表复制到新节点：

# 重分布sharding表
postgres=# ALTER TABLE disttab ADD NODE (dn3);
ALTER TABLE
# 复制数据到新节点
postgres=#  ALTER TABLE repltab ADD NODE (dn3);
ALTER TABLE

查看新的数据分布：

postgres=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
 xc_node_id | count 
------------+-------
 -700122826 |    36
 -927910690 |    32
 1148549230 |    32
(3 rows)

登录dn3(新增的时候，放在了datanode1服务器上，端口5430)节点查看数据：

[postgres@gtm ~]$ psql -h datanode1 -p 5430 -U postgres
psql (PGXL 9.5r1.3, based on PG 9.5.4 (Postgres-XL 9.5r1.3))
Type "help" for help.
postgres=# select count(*) from repltab;
 count 
-------
   100
(1 row)

很明显,通过 ALTER TABLE tt ADD NODE (dn)命令，可以将DISTRIBUTE表数据重新分布到新节点，重分布过程中会中断所有事务。可以将REPLICATION表数据复制到新节点。

从datanode节点中回收数据

postgres=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
postgres=# ALTER TABLE repltab DELETE NODE (dn3);
ALTER TABLE

删除数据节点

Postgresql-XL并没有检查将被删除的datanode节点是否有replicated/distributed表的数据，为了数据安全，在删除之前需要检查下被删除节点上的数据，有数据的话，要回收掉分配到其他节点，然后才能安全删除。删除数据节点分为四步骤：

查询要删除节点dn3的oid

postgres=#  SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11819 | coord1    | C         |      5432 | datanode1 | f              | f                |  1885696643
 16384 | coord2    | C         |      5432 | datanode2 | f              | f                | -1197102633
 16385 | node1     | D         |      5433 | datanode1 | f              | t                |  1148549230
 16386 | node2     | D         |      5433 | datanode2 | f              | f                |  -927910690
 16397 | dn3       | D         |      5430 | datanode1 | f              | f                |  -700122826
(5 rows)

查询dn3对应的oid中是否有数据

testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets |     nodeoids      
---------+---------------+----------+-----------------+---------------+-------------------
   16388 | H             |        1 |               1 |          4096 | 16397 16385 16386
   16394 | R             |        0 |               0 |             0 | 16397 16385 16386
(2 rows)

有数据的先回收数据

postgres=# ALTER TABLE disttab DELETE NODE (dn3);
ALTER TABLE
postgres=# ALTER TABLE repltab DELETE NODE (dn3);
ALTER TABLE
postgres=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397];
 pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids 
---------+---------------+----------+-----------------+---------------+----------
(0 rows)

安全删除dn3

PGXC$  remove datanode master dn3 clean

coordinate节点管理

同datanode节点相似，列出语句不做测试了：

# 新增coordinate
PGXC$  add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none none none
# 删除coordinate,clean选项可以将相应的数据目录也删除
PGXC$  remove coordinator master coord3 clean

故障切换

查看当前数据集群

postgres=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11819 | coord1    | C         |      5432 | datanode1 | f              | f                |  1885696643
 16384 | coord2    | C         |      5432 | datanode2 | f              | f                | -1197102633
 16385 | node1     | D         |      5433 | datanode1 | f              | t                |  1148549230
 16386 | node2     | D         |      5433 | datanode2 | f              | f                |  -927910690
(4 rows)

模拟node1节点故障

PGXC$  stop -m immediate datanode master node1
Stopping datanode master node1.
Done.

测试集群查询

postgres=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR:  Failed to get pooled connections
postgres=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3;
 xc_node_id | col1 | col2 | col3 
------------+------+------+------
 -927910690 |    3 |  103 | foo
(1 row)

测试发现，查询范围如果涉及到故障的node1节点，会报错，而查询的数据范围不在node1上的话，仍然可以查询。

手动切换node1的slave

PGXC$  failover datanode node1
# 切换完成后，查询集群
postgres=# SELECT oid, * FROM pgxc_node;
  oid  | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred |   node_id   
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
 11819 | coord1    | C         |      5432 | datanode1 | f              | f                |  1885696643
 16384 | coord2    | C         |      5432 | datanode2 | f              | f                | -1197102633
 16386 | node2     | D         |      5433 | datanode2 | f              | f                |  -927910690
 16385 | node1     | D         |     15433 | datanode2 | f              | t                |  1148549230
(4 rows)

发现node1节点的ip和端口都已经替换为配置的slave了。

部署遇到的问题

在配置的时候一定要细心，避免端口号之类的配置冲突等错误。

错误一：

postgres=# create table test1(id integer,name varchar(20));
LOG:  failed to connect to node, connection string (host=192.168.0.125 port=1925 dbname=postgres user=postgres application_name=pgxc sslmode=disable options='-c remotetype=coordinator -c parentnode=coord1  -c DateStyle=iso,mdy -c timezone=prc -c geqo=on -c intervalstyle=postgres -c lc_monetary=C'), connection error (fe_sendauth: no password supplied
        )
WARNING:  can not connect to node 16384
WARNING:  Health map updated to reflect DOWN node (16384)
LOG:  Pooler could not open a connection to node 16384
LOG:  failed to acquire connections
STATEMENT:  create table test1(id integer,name varchar(20));
ERROR:  Failed to get pooled connections
HINT:  This may happen because one or more nodes are currently unreachable, either because of node or network failure.
         Its also possible that the target node may have hit the connection limit or the pooler is configured with low connections.
         Please check if all nodes are running fine and also review max_connections and max_pool_size configuration parameters
STATEMENT:  create table test1(id integer,name varchar(20));

原因：这个是由于某些环境或配置出了问题，我的就是pg_hba.conf配置出了问题，Ipv4要改成 0:0:0:0/0 trust才行。
但这仅仅是一个问题，开发者搭建环境遇到这个错误，一定要检查如下:

** 各个机器的防火墙是否关闭？**
** 各个机器的SELINUX状态是否是disabled？**
** 各个机器的ssh免密登录是否成功？**
** 各个节点的pg_hba.conf,postgresql.conf是否配置为信任登录？是否有IP限制？**
** 超过某些节点的最大连接数？（对于我们测试环境来说，肯定不会是这个问题）

posted @ 2026-05-18 15:21 数据库小白（专注）阅读(10) 评论(0) 收藏举报

刷新页面返回顶部

数据库小白（专注）

PG分布式之PGXL安装部署-postgresql-xl-9.5

PG分布式之PGXL安装部署-postgresql-xl-9.5

介绍Postgres-XL

Postgres-XL功能特性

Postgres-XL缺少的功能

Postgres-XL架构

Postgre-XL主要组件

Postgres-XL继承了PostgreSQL

应用场景

安装配置

集群规划

hosts配置（三台都配置）

安装依赖包

防火墙selinux

同步系统时间

创建用户

gtm节点免密登陆配置

系统资源限制配置

软连接和文件数限制

关闭THP

配置大页hugepage

Postgres-XL安装

配置环境变量

集群配置

生成pgxc_ctl配置文件

配置pgxc_ctl.conf

集群初始化

集群验证和管理

集群验证

集群操作

清理集群

启动

关闭

常用命令

在线扩容（新增 Coordinator / Datanode）

在线缩容（移除节点）

元数据与拓扑查询

备份 & 恢复

GTM 主从切换（手动）

常用运维脚本

建表说明

模拟部分数据，插入测试数据：

查看数据分布结果：

查看另一个datanode2中repltab表结果

新增datanode节点与数据重分布

在gtm集群管理节点上执行pgxc_ctl命令

在PGXC后面执行新增数据节点命令:

等待新增完成后，查询集群节点状态：

数据重分布

之前我们的DISTRIBUTE表分布在了node1,node2节点上，如下：

新增一个节点后，将sharding表数据重新分配到三个节点上，将repl表复制到新节点：

查看新的数据分布：

登录dn3(新增的时候，放在了datanode1服务器上，端口5430)节点查看数据：

从datanode节点中回收数据

删除数据节点

查询要删除节点dn3的oid

查询dn3对应的oid中是否有数据

有数据的先回收数据

安全删除dn3

coordinate节点管理

同datanode节点相似，列出语句不做测试了：

故障切换

查看当前数据集群

模拟node1节点故障

测试集群查询

手动切换node1的slave

部署遇到的问题

错误一：

公告