KingbaseESV8R6中一个报错:tuple concurrently updated

前言

前几天同事在生产环境遇到的一个报错:tuple concurrently updated

从字面意思看是行的并发争用。这个问题原因是数据库更新系统表时产生了争用,所以不同session不能并发更新系统表。

测试

窗口1,不提交事务
begin;
truncate t1;

窗口2,因为无法更新系统表信息,授权语句无法执行成功,卡主
grant select on all tables in schema public to u2 ;

这时候ctrl+c终止后收到一条信息,这个关键点可以帮助我们分析问题
TEST=# grant select on all tables in schema public to u2 ;
^CCancel request sent
错误:  由于用户请求而正在取消查询
CONTEXT:  当更新关系"sys_class"的元组(28, 37)时

查看系统表的行偏移(28,37),发现表t1上有争用
TEST=# select * from sys_class where ctid = '(28,37)';
  oid   | relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence |
relkind | relnatts | relchecks | relhasrules | relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated | relreplident | relispartition | relrewrite | relfrozenxid | relminmxi
d |               relacl                | reloptions | relpartbound
--------+---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+-
--------+----------+-----------+-------------+----------------+----------------+----------------+---------------------+----------------+--------------+----------------+------------+--------------+----------
--+-------------------------------------+------------+--------------
 110336 | t1      |         2200 |  110338 |         0 |       10 |     2 |      110492 |             0 |        0 |         0 |             0 |             0 | f           | f           | p              |
r       |        1 |         0 | f           | f              | f              | f              | f                   | t              | d            | f              |          0 |      1914706 |        56
7 | {system=arwdDxt/system,u2=r/system} |            |
(1 row)

查看表sys_stat_activity,发现有会话正在 tuncate t1的操作,说明truncate语句和grant语句产生更新系统表的冲突
TEST=# select * from sys_stat_activity where state<>'idle' and pg_backend_pid() != pid and (backend_xid is not null or backend_xmin is not null )  ;
 datid | datname |  pid  | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start
 |         state_change          | wait_event_type | wait_event |        state        | backend_xid | backend_xmin |             query             |  backend_type
-------+---------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+------------------------------
-+-------------------------------+-----------------+------------+---------------------+-------------+--------------+-------------------------------+----------------
 91362 | test    | 23875 |       10 | system  |                  |             |                 |             | 2023-02-09 16:59:30.233225+08 | 2023-02-17 15:51:13.788882+08 | 2023-02-17 15:51:13.788882+08
 | 2023-02-17 15:51:13.788896+08 | Lock            | relation   | active              |     1914711 |      1914708 | select perf.snapshot_timer(); | kwr collector
 91362 | test    |   535 |   110096 | u2      | kingbase_*&+_    |             |                 |          -1 | 2023-02-17 15:33:14.456389+08 | 2023-02-17 15:49:58.210625+08 | 2023-02-17 15:50:00.666372+08
 | 2023-02-17 15:50:00.666792+08 | Client          | ClientRead | idle in transaction |     1914708 |              | truncate t1;                  | client backend
(2 rows)

重复上述操作,直到授权语句卡主不取消
然后窗口1,commit提交事务后,

窗口2 会发生错误tuple concurrently updated,
test=# grant select on all tables in schema public to u2 ;
ERROR: tuple concurrently updated

窗口1提交后,正常情况窗口2的授权语句可以顺利执行成功,可是从数据库日志中看到,commit和grant时间几乎是同时完成的,所以仍然会有此报错。
此案例提供了分析思路和方法,建议类似grant和truncate需要更新sys_class信息的语句按事务顺序执行,不要并发完成。

posted @ 2024-04-03 17:00  KINGBASE研究院  阅读(20)  评论(0编辑  收藏  举报