mariadb的flashback到底怎么样???防误删可以,但算不上真正的闪回--再看mariadb 10.3的System-Versioned Tables

mariadb 在10.2.4引入闪回特性,支持DML(INSERT, DELETE, UPDATE)操作的闪回,不支持DDL语句,使用闪回,必须设置binlog_row_image=FULL。

其原理和oracle有undo不一样,将INSERT重写为DELETE, DELETE重写为INSERT, UPDATE根据前后值进行交换,这也是必须设置binlog_row_image=FULL的原因。

mysqlbinlog默认情况下会生成重做SQL,通过使用新增的"--flashback"选项,可以生成自某个SCN或者时间点以来的反向SQL。看如下对比:

完了之后,就可以通过执行mysql < flashback.sql将所有变更操作还原了。

实际上受制于mysql的体系架构,这个被称为闪回的实现是很low的,准确的说都算不上,随便一个高级开发换点时间慢慢搞都可以做出来,它还不支持查询。

除此之外,它还需要访问到mysql的binlog,这也是个比较困难的事,因为运维体系可能不允许用户直接访问mysql服务器,如果是阿里云的RDS,就更是如此了。

对于时间点恢复这个事情,还有一种典型的做法是依赖于从库,通过延迟复制的方式实现,这种方式用于实现OLTP或者误操作是可以的,但是把它作为一个撤销操作的机制就比较强人所难了,需要人工干预的侵入性太强了。除非不得已,我们不会选择这种实现方式,太脆弱了。

============================================================================================

再来看看mariadb 10.3的表版本化怎么样?

首先看各种数据库闪回的实现机制:

在mariadb中,表版本化是10.3.4开始引入的,参考了SQL:2011的标准,截止本文编写,mariadb 10.3系列的最新版本为MariaDB 10.3.10 Stable,10.3.7发布第一个GA版本。如下:

所以新是有点新,具体看怎么办了。。。。

The CREATE TABLE syntax has been extended to permit creating a system-versioned table. To be system-versioned, according to SQL:2011, a table must have two generated columns, a period, and a special table option clause:

CREATE TABLE t(
   x INT,
   start_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
   end_timestamp TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
   PERIOD FOR SYSTEM_TIME(start_timestamp, end_timestamp)
) WITH SYSTEM VERSIONING;

In MariaDB one can also use a simplified syntax:

CREATE TABLE t (
   x INT
) WITH SYSTEM VERSIONING;

In the latter case no extra columns will be created and they won't clutter the output of, say, SELECT * FROM t. The versioning information will still be stored, and it can be accessed via the pseudo-columns ROW_START and ROW_END:

SELECT x, ROW_START, ROW_END FROM t;

 采用简化的语法可以使得现有的SQL都不用调整,这很重要。

CREATE TABLE t (
   x INT
) WITH SYSTEM VERSIONING;
insert into t values(1),(2),(3);
insert into t values(4),(5),(6);

select now();
2018-10-23 11:58:54
select * from t;
delete from t;  
select * from t;   --此时默认查不到记录了

SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2018-10-23 11:58:54'

历史版本查到了。

和oracle一样,还支持版本历史查询,如下:

  • BETWEEN start AND end will show all rows that were visible at any point between two specified points in time. It works inclusively, a row visible exactly at start or exactly at end will be shown too.
SELECT * FROM t FOR SYSTEM_TIME BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW()
  • FROM start TO end will also show all rows that were visible at any point between two specified points in time, including start, but excluding end.
SELECT * FROM t FOR SYSTEM_TIME FROM '2016-01-01 00:00:00' TO '2017-01-01 00:00:00'

Additionally MariaDB implements a non-standard extension:

  • ALL will show all rows, historical and current.
SELECT * FROM t FOR SYSTEM_TIME ALL

If the FOR SYSTEM_TIME clause is not used, the table will show the current data, as if one had specified FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP.

初步看来,特性还是不错的,比那个鸡肋的闪回要好多了。

-------------再来看下性能-----------

本地PC笔记本测试
create table big_table like information_schema.columns;
insert into big_table select * from information_schema.columns;
ALTER TABLE big_table ADD SYSTEM VERSIONING;
insert into big_table select * from big_table; -- 重复执行造24w记录
select * from big_table;
update big_table set table_catalog = CONCAT(table_catalog,'-','cata');
受影响的行: 239232
时间: 0.882s
select now();
2018-10-23 12:51:28
delete from big_table where ordinal_position < 3;
[SQL]delete from big_table where ordinal_position < 3;
受影响的行: 40576
时间: 0.176s
select now();
2018-10-23 12:54:30
update big_table set table_catalog = SUBSTR(table_catalog,4);
[SQL]update big_table set table_catalog = SUBSTR(table_catalog,4);
受影响的行: 198656
时间: 0.794s
select now();
2018-10-23 12:55:10
alter table big_table add column my_col int;
[Err] 4119 - Not allowed for system-versioned `test`.`big_table`. Change @@system_versioning_alter_history to proceed with ALTER.
set @@system_versioning_alter_history=KEEP;
alter table big_table add column my_col int;
[SQL]alter table big_table add column my_col int;
受影响的行: 677120
时间: 1.616s
select now();
2018-10-23 12:56:44
select * from big_table; -- 新增的字段会带出来

根据上述测试,和非系统版本表相差并不大,只不过根据官方所述,如果按照默认都存在一张表,历史数据多了之后会导致性能下降,所以有两种解决方法:独立分区维护历史版本;定期删除。

删除方法1:

ALTER TABLE t ADD SYSTEM VERSIONING;
ALTER TABLE t DROP SYSTEM VERSIONING;

删除方法2:

DELETE HISTORY FROM t;

删除方法3:

DELETE HISTORY FROM t BEFORE SYSTEM_TIME '2016-10-09 08:07:06';

 下面来看使用独立的分区维护历史的方式(纯操作类,各位读者自己来):

Storing the History Separately

When the history is stored together with the current data, it increases the size of the table, so current data queries — table scans and index searches — will take more time, because they will need to skip over historical data. If most queries on that table use only current data, it might make sense to store the history separately, to reduce the overhead from versioning.

This is done by partitioning the table by SYSTEM_TIME. Because of partition pruning optimization, all current data queries will only access one partition, the one that stores current data.

This example shows how to create such a partitioned table:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME (
    PARTITION p_hist HISTORY,
    PARTITION p_cur CURRENT
  );

In this example all history will be stored in the partition p_hist while all current data will be in the partition p_cur. The table must have exactly one current partition and at least one historical partition.

Partitioning by SYSTEM_TIME also supports automatic partition rotation. One can rotate historical partitions by time or by size. This example shows how to rotate partitions by size:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME LIMIT 100000 (
    PARTITION p0 HISTORY,
    PARTITION p1 HISTORY,
    PARTITION pcur CURRENT
  );

MariaDB will start writing history rows into partition p0, and when it reaches a size of 100000 rows, MariaDB will switch to partition p1. There are only two historical partitions, so when p1 overflows, MariaDB will issue a warning, but will continue writing into it.

Similarly, one can rotate partitions by time:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK (
    PARTITION p0 HISTORY,
    PARTITION p1 HISTORY,
    PARTITION p2 HISTORY,
    PARTITION pcur CURRENT
  );

This means that the history for the first week after the table was created will be stored in p0. The history for the second week — in p1, and all later history will go into p2. One can see the exact rotation time for each partition in the INFORMATION_SCHEMA.PARTITIONS table.

It is possible to combine partitioning by SYSTEM_TIME and subpartitions:

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME
    SUBPARTITION BY KEY (x)
    SUBPARTITIONS 4 (
    PARTITION ph HISTORY,
    PARTITION pc CURRENT
  );

 

默认的系统版本有一个问题,就是他不满足读提交隔离级别,也就是非事务隔离,文档也讲了,例子参见https://jira.mariadb.org/browse/MDEV-16236。要支持事务隔离,得按照下面这样来:

CREATE TABLE demo_system_versioning (
  id INTEGER NOT NULL,
  data VARCHAR(255),
  start_ts BIGINT INVISIBLE UNSIGNED GENERATED ALWAYS AS ROW START NOT NULL,
  end_ts   BIGINT INVISIBLE UNSIGNED GENERATED ALWAYS AS ROW END NOT NULL,
  PERIOD FOR SYSTEM_TIME (start_ts, end_ts),
  PRIMARY KEY (id)
) WITH SYSTEM VERSIONING;
 
START TRANSACTION;
INSERT INTO demo_system_versioning (id, data) VALUES (1, 'X');
SELECT SLEEP(0.1);
INSERT INTO demo_system_versioning (id, data) VALUES (2, 'Y');
COMMIT;
 
START TRANSACTION;
INSERT INTO demo_system_versioning (id, data) VALUES (3, 'X');
SELECT SLEEP(0.1);
INSERT INTO demo_system_versioning (id, data) VALUES (4, 'Y');
COMMIT;


-- should return a single row as both rows where inserted in the same transaction
SELECT COUNT(*), start_ts
  FROM demo_system_versioning
 GROUP BY start_ts;

START TRANSACTION;
delete from demo_system_versioning where id = 1;
SELECT SLEEP(0.1);
delete from demo_system_versioning where id = 3;
COMMIT;


START TRANSACTION;
INSERT INTO demo_system_versioning (id, data) VALUES (5, 'X');
SELECT SLEEP(0.1);
INSERT INTO demo_system_versioning (id, data) VALUES (6, 'Y');
COMMIT;

SELECT *
  FROM demo_system_versioning
FOR SYSTEM_TIME AS OF TIMESTAMP '2018-10-23 13:58:53' where id >2;

 

注意:10.3.8以及之前的版本都会提示trt_begin_ts函数不存在。

其还会用到mysql.transaction_registry表,定义如下:

FieldTypeNullKeyDefaultDescription
transaction_id bigint(20) unsigned NO Primary NULL  
commit_id bigint(20) unsigned NO Unique NULL  
begin_timestamp timestamp(6) NO Multiple 0000-00-00 00:00:00.000000 Timestamp when the transaction began (BEGIN statement), however see MDEV-16024.
commit timestamp(6) NO Multiple 0000-00-00 00:00:00.000000 Timestamp when the transaction was committed.
isolation_level enum('READ-UNCOMMITTED','READ-COMMITTED','REPEATABLE-READ','SERIALIZABLE') NO   NULL Transaction isolation level.
 
最后系统变量system_versioning_alter_history控制是否允许DDL,默认不允许,此时会报错。
 
根据上述针对闪回和表系统版本特性的测试,基本上可以认为闪回只能用于小白场景,也就是不做任何设计的数据库。

参考:

https://modern-sql.com/blog/2018-08/whats-new-in-mariadb-10.3

https://jira.mariadb.org/browse/MDEV-16236

https://jira.mariadb.org/browse/MDEV-16024  

https://mariadb.com/kb/en/library/system-versioned-tables/

https://www.slideshare.net/MariaDB/m18-querying-data-at-a-previous-point-in-time

posted @ 2018-10-23 16:16  zhjh256  阅读(...)  评论(...编辑  收藏