obce_7

OCP 系统参数 ocp.site.url 地址改为分配 OCP 地址

http://139.224.196.253:8080/login

系统参数: 搜索 ocp.site.url 修改为 http://172.16.1.40:8080(一定要私网ip)

n OCP 服务器与两台 observer 做好 NTP 同步

OCP /etc/ntp.conf 文件修改为

restrict 172.16.0.0 mask 255.255.0.0 nomodify
server 127.127.1.0
fudge 127.127.1.0 stratum 10

OBserver /etc/ntp.conf 文件修改为

server <ocp 的私网地址>
fudge <ocp 的私网地址> stratum 10

重启 ntp 服务

service ntpd restart

以下命令查看同步情况

ntpstat
ntpq -np

ocp 添加主机
除了 ip 和凭证以外

创建群集
指定版本
添加主机
修改参数
如果未改参数需要终止->放弃

obce
aaAA11__

__min_full_resource_pool_memory
2147483648
system_memory
5G
memory_limit
15G
cpu_count
32
net_thread_count
8
cache_wash_threshold
2GB
workers_per_cpu_quota
4
stack_size
512k

创建备群集
指定版本
添加主机
修改参数
如果未改参数需要终止->放弃

创建obproxy
proxy_mem_limited=1G

注意：如果使用 OCP 创建小规格租户需要做如下设置：

[root@obocp tools]# pwd
/root/t-oceanbase-antman/tools

[root@obocp tools]# ls
deploy_docker deploy_physical getpass.sh ob-checker obctl
oceanbase_dba_helper setpass.sh update_meta_VIP.sh

[root@obocp tools]# ./getpass.sh
sys: YXdw@0+6rE
meta: kD5InIB1%o
monitor: 0f9pxw^W0P
sysmonitor: c0bP%oWO^z
omsmeta: 0}Rl82u8lR
odcmeta: uUbCcSN9j+
proxysys: 5WkbsYkv+9
proxyro: 6i%diJsP6}

使用ocp的ip地址

obclient -h172.16.1.216 -P2883 -uroot@ocp_meta#obcluster -p'nNzbt5I1@W'
use ocp
select * from config_properties where key like '%small%'\G
update config_properties set value='true' where key like '%small%';

obclient -uroot@sys#obce -h172.16.1.214 -P2883 -p'aaAA11__'

create resource unit u1_ora max_cpu=1,min_cpu=1,max_memory='2G',min_memory='2G',max_iops=128,max_disk_size='10G',max_session_num=100;

新建资源池

create resource pool pool_ora unit='u1_ora',unit_num=1,zone_list=('zone1');

新建 oracle 租户

CREATE TENANT IF NOT EXISTS ob_ora charset='utf8mb4', replica_num=1,zone_list=('zone1'), resource_pool_list=('pool_ora') SET ob_tcp_invited_nodes='%', ob_compatibility_mode='oracle';

新建资源规格

create resource unit u1_mysql max_cpu=1,min_cpu=1,max_memory='2G',min_memory='2G',max_iops=128,max_disk_size='10G',max_session_num=100;

新建资源池

create resource pool pool_mysql unit='u1_mysql',unit_num=1 ;

新建 mysql 租户

CREATE TENANT IF NOT EXISTS ob_rpt charset='utf8mb4', replica_num=1, resource_pool_list=('pool_mysql') SET ob_tcp_invited_nodes='%', ob_compatibility_mode='mysql';

obclient -usys@ob_ora -h172.16.1.214 -P2883

create user tpcc identified by obce_test;
grant dba to tpcc;

obclient -utpcc@ob_ora -h172.16.1.214 -P2883 -p

新备

obclient -uroot@sys#obce:1654142545 -h172.16.1.214 -P2883 -p'aaAA11__'

select cluster_role, protection_mode, protection_level, current_scn from v$ob_cluster;

新主

obclient -uroot@sys#obce:1654142546 -h172.16.1.214 -P2883 -p'aaAA11__'

select cluster_role, protection_mode, protection_level, current_scn from v$ob_cluster;

select switchover_status, switchover_info from v$ob_cluster;
+-------------------+-------------------------------------+
| switchover_status | switchover_info |
+-------------------+-------------------------------------+
| TO STANDBY | SYNCED STANDBY CLUSTERS: 1654142545 |
+-------------------+-------------------------------------+
1 row in set (0.057 sec)

4.1.2
最大保护模式和最大可用模式下，执行 switchover 前，要求将主集群配置为 SYNC 模式，保证switchover 之后，仍然存在一个 SYNC 模式的备集群。
在切换集群角色前，可以通过 v$ob_cluster 视图的 switchover_status 列，查看是否可以切换。如果为 not allowed 或其他值，说明当前不满足切换条件。

alter system modify cluster 'obce' cluster_id 1654142546 set redo_transport_options = 'sync net_timeout=30000000';

主群集上执行:

obclient -uroot@sys#obce:1654142546 -h172.16.1.214 -P2883 -p'aaAA11__'

select switchover_status, switchover_info from oceanbase.v$ob_cluster;
+-------------------+-----------------+
| switchover_status | switchover_info |
+-------------------+-----------------+
| TO PRIMARY | |
+-------------------+-----------------+
1 row in set (0.020 sec)

步骤 1 主集群切换成备集群
alter system commit to switchover to physical standby;

步骤 2 备集群切换为主集群
obclient -uroot@sys#obce:1654142545 -h172.16.1.214 -P2883 -p'aaAA11__'
alter system commit to switchover to primary;

步骤 3 切换完成后，检查主备的新角色与集群同步状态
select cluster_role, protection_mode, protection_level, current_scn from oceanbase.v$ob_cluster;
+--------------+---------------------+---------------------+------------------+
| cluster_role | protection_mode | protection_level | current_scn |
+--------------+---------------------+---------------------+------------------+
| PRIMARY | MAXIMUM PERFORMANCE | MAXIMUM PERFORMANCE | 1681369969085013 |
+--------------+---------------------+---------------------+------------------+
1 row in set (0.004 sec)

4.2 无损failover 实验步骤
步骤 1 主集群切换为最大保护模式
在主集群上，执行以下命令，查看当前集群的保护模式。
obclient -uroot@sys#obce:1654142545 -h172.16.1.214 -P2883 -p'aaAA11__'
select protection_mode,protection_level ,cluster_id from oceanbase.v$ob_cluster;
根据保护模式对日志传输参数的要求，为不同的备集群配置合适的日志传输参数。
alter system modify cluster 'obce' cluster_id 1654142546 set redo_transport_options = 'sync net_timeout=30000000';

进行切换前的集群状态检查，确保能成功切换。
select cluster_id, synchronization_status from oceanbase.v$ob_standby_status;

在主集群上，执行以下命令，切换保护模式
alter system set standby cluster to maximize protection;

select protection_mode,protection_level ,cluster_id from oceanbase.v$ob_cluster;

步骤 2 执行无损 failover
手工杀掉主集群 observer 进程，确保主集群处于不可用状态。
ps -ef | grep observer

在各备集群查询保护模式与保护级别，确保可以执行无损 failover。
select protection_mode, protection_level from oceanbase.v$ob_cluster;
执行以下命令，将备集群切换为主集群。
alter system failover to 'obce' cluster_id=1654142546 ;
执行 failover 后，默认集群会进入最大性能模式，并且原主集群与其他备集群处于 DISABLED 状态。
select cluster_id, cluster_role, cluster_status from v$ob_standby_status;
+------------+------------------+----------------+
| cluster_id | cluster_role | cluster_status |
+------------+------------------+----------------+
| 1654142545 | PHYSICAL STANDBY | DISABLED |
+------------+------------------+----------------+
1 row in set (0.002 sec)

步骤 3 手动触发合并。
无损 failover 成功后，如果集群的合并版本为 1，建议发起一轮合并，并且等待合并成功后再添加新的备集群。
alter system major freeze;

步骤 4 恢复原主集群
failover 开始到恢复原主集群前，要求原主集群的所有 Server 一直处于宕机状态，否则原主集群的数据可能与新主集群的数据不一致。
在恢复原主集群的过程中，要求原主集群的所有 Server 以特殊参数启动。集群启动成功后，会进入 DISABLED 状态。DISABLED 状态的集群不接受新的写入，不会产生新的日志，保证新主集群和
原主集群处于一致状态。如果原主集群的所有 Server 没有采用特殊参数启动，则原主集群会产生新的日志，导致接入流程失败或者出现数据 Checksum 错误。
在原主集群的所有 Server 上，指定 -m disabled_cluster 参数，启动 OBServer 。
su - admin
cd /home/admin/oceanbase/
bin/observer -m disabled_cluster
原主集群恢复后，查询原主集群的状态，发现原主集群是 PRIMARY 角色，但是处于 DISABLED 状
态。
obclient -uroot@sys#obce:1654142545 -h172.16.1.214 -P2883 -p'aaAA11__'
select cluster_id, cluster_role, cluster_status from oceanbase.v$ob_cluster;

步骤 5 将原主集群切换为备集群角色
在原主集群上，执行以下命令，切换为备集群角色。
alter system convert to physical standby;
完成后，查看集群角色，切换成功。
select cluster_id, cluster_role, cluster_status from oceanbase.v$ob_cluster;
+------------+------------------+----------------+
| cluster_id | cluster_role | cluster_status |
+------------+------------------+----------------+
| 1654142545 | PHYSICAL STANDBY | DISABLED |
+------------+------------------+----------------+
1 row in set (0.001 sec)

步骤 6 在新主集群上，开启原主集群同步
登录新主集群，执行以下命令，开启备集群同步。
obclient -uroot@sys#obce:1654142546 -h172.16.1.214 -P2883 -p'aaAA11__'
alter system enable cluster synchronization 'obce' cluster_id=1654142545;
查看同步状态，确认开启成功。
select cluster_id, cluster_role, cluster_status from v$ob_standby_status;

步骤 7 检查备集群的同步情况与延时

4.3 有损 failover
步骤 1 主集群通过脚本插入数据
登录 ob_mysql 租户，这个 mysql 租户下用通用的命令创建一张分区表：
obclient -uroot@ob_rpt -h172.16.1.214 -P2883
alter user root identified by 'obce_test';

obclient -uroot@ob_rpt -h172.16.1.215 -P2881 -p

use test;
create table tpcc (c1 varchar(20), c2 int,c3 varchar(20) ) partition by hash(c2+1)
partitions 3;
增加分区表数据

delimiter //
create procedure bulk_insert_tpcc()
begin
declare k int;
declare i int;
set i=1;
while i<3601 do
insert INTO tpcc (c1,c2 ,c3) values ('a',i,i);
select sleep(1) into k ;
set i=i+1;
end while;
end
//
delimiter ;
call bulk_insert_tpcc();

步骤 2 模拟主集群发生故障，产生主备不同步的场景。
主集群插入数据的过程中，首先断开备集群的网络
注意：以下操作仅为模拟有损切换过程中造成数据丢失，可能与实际生产场景有所不符。
备集群：
nohup ifdown eth0 && sleep 120 && ifup eth0 &

此处设置为 120 秒网卡中断，学员可以根据情况自行调整中断时长等待 1 分钟并在 2 分钟之内，强制关闭主集群服务。

ps -ef | grep observer
kill -9 13282
ps -ef | grep observer

步骤 3 等待备集群网卡恢复，通过 OCP
登录备集群，执行“切为主集群：容灾切换”操作
显示切换成功，此时 OCP 的集群管理页面显示两个主集群，点击原有主集群可以看到进程终止的

原主集群已经处于“已废弃”状态，选择右上角“启动为只读集群”
步骤 4 分别通过黑屏登录主备集群，读取 tpcc 表的数据信息
登录原来备集群（现转为主集群）
obclient -h172.16.1.214 -uroot@ob_rpt#obce:1654142545 -P2883 -c -A
select count(1) from test.tpcc;

登录原来主集群（现转为只读集群）
obclient -h172.16.1.214 -uroot@ob_rpt#obce:1654142546 -P2883 -c -A
select count(1) from sbtest.tpcc;

思考：造成主备库数据量不同是什么原因？
步骤 5 通过 OCP 集群管理删除原主集群（现只读集群）

OBProxy 集群里删除主机

主机列表中删除服务器
步骤 6 对原主集群服务器清空相关文件，重新添加为备集群（实验资源原因，重复利用原主集群机
器资源搭建新的备集群）
cd /home/admin
rm -rf *

建议对 /data/1 和 /data/log1 文件夹也做相同清空操作

通过 OCP 添加备集群，将清空服务器重新作为备集群加入

posted @ 2023-09-05 21:08 chinesern 阅读(17) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

chinesern

obce_7

OCP 系统参数 ocp.site.url 地址改为分配 OCP 地址

OCP /etc/ntp.conf 文件修改为

restrict 172.16.0.0 mask 255.255.0.0 nomodify server 127.127.1.0 fudge 127.127.1.0 stratum 10

OBserver /etc/ntp.conf 文件修改为

重启 ntp 服务

以下命令查看同步情况

使用ocp的ip地址

新建资源池

新建 oracle 租户

新建资源规格

新建资源池

新建 mysql 租户

新备

新主

此处设置为 120 秒网卡中断，学员可以根据情况自行调整中断时长等待 1 分钟并在 2 分钟之内，强制关闭主集群服务。

建议对 /data/1 和 /data/log1 文件夹也做相同清空操作

公告

restrict 172.16.0.0 mask 255.255.0.0 nomodify
server 127.127.1.0
fudge 127.127.1.0 stratum 10