锁阻塞问题排查
如果我们执行一个SQL被阻塞了,并且初步判断是被锁阻塞的,那么原因可能为:
1.被InnoDB锁阻塞;
2.被server层的表级锁,或者全局锁阻塞。
一、查看线程状态
不管是被谁阻塞了,都先执行”show processlist;”查看线程状态,阻塞原因不同,看到的状态也不一样,下面分别举例说明。
第1种情况:被元数据锁阻塞了”Waiting for table metadata lock” 。
前面我们说的server层的表级锁,比如执行”lock table t1 read;”其实获取的就是元数据锁,也就是常说的metadata lock,简称为MDL。但是这样我们只能看到被阻塞的线程,要找到阻塞的源头,需要看下一步方法。
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------
| Id | User | Host | db | Command | Time | State | Info
+----+------+-----------+------+---------+------+---------------------------------+------------------------
| 8 | root | localhost | hucq | Sleep | 40 | | NULL
| 9 | root | localhost | hucq | Sleep | 270 | | NULL
| 10 | root | localhost | sys | Query | 0 | starting | show processlist
| 11 | root | localhost | hucq | Query | 15 | Waiting for table metadata lock | select * from t3 where c2=40 for update |
+----+------+-----------+------+---------+------+---------------------------------+------------------------
4 rows in set (0.00 sec)
第2种情况:被全局锁阻塞了”Waiting for global read lock”。
最常见的Xtrabackup备份时最后拷贝表结构和非事务引擎表时会执行”flush tables with read”加全局锁(注意:不是表级锁),全局锁定期间,阻塞写操作,允许读操作:
mysql> show processlist;
+----+------+-----------+------+---------+------+------------------------------+---------------------------
| Id | User | Host | db | Command | Time | State | Info
+----+------+-----------+------+---------+------+------------------------------+--------------------------
| 8 | root | localhost | hucq | Query | 40 | Waiting for table flush | flush tables with read lock |
| 9 | root | localhost | hucq | Query | 10 | Waiting for global read lock | update t3 set c2=1 where c2=20 |
| 10 | root | localhost | sys | Query | 0 | starting | show processlist
| 12 | root | localhost | hucq | Query | 43 | User sleep | select *,sleep(30) from t3
+----+------+-----------+------+---------+------+------------------------------+---------------------------
上面一种情况是本身 flush tables with read lock 加全局锁的操作被阻塞,后续更新操作也跟着被阻塞,所以可以看到锁源头。
但是如果加全局锁成功后,更新操作被阻塞时,show processlist 是看不到锁源头的,需要看下一步方法:
mysql> show processlist;
+----+------+-----------+------+---------+------+------------------------------+---------------------------
| Id | User | Host | db | Command | Time | State | Info
+----+------+-----------+------+---------+------+------------------------------+---------------------------
| 2 | root | localhost | hucq | Query | 7 | Waiting for global read lock | insert into t3 values(70,70) |
| 3 | root | localhost | hucq | Sleep | 13 | | NULL
| 4 | root | localhost | NULL | Query | 0 | starting | show processlist
| 9 | root | localhost | hucq | Sleep | 35 | | NULL
+----+------+-----------+------+---------+------+------------------------------+---------------------------
正常状态
第3种情况,SQL虽然被阻塞了,但是状态并不显示有lock,这个通常就是被InnoDB锁阻塞了,不太容易被发现:
mysql> show processlist;
+----+------+-----------+------+---------+------+----------+---------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+----------+---------------------------------+
| 8 | root | localhost | hucq | Query | 5 | updating | update t3 set c2=90 where c2=40 |
| 10 | root | localhost | sys | Query | 0 | starting | show processlist |
| 11 | root | localhost | hucq | Sleep | 743 | | NULL |
+----+------+-----------+------+---------+------+----------+---------------------------------+
4 rows in set (0.00 sec)
二、查看锁阻塞视图
上面我们看到的被阻塞SQL的状态,接下来需要找到阻塞的源头,这样我们才能解决阻塞的问题。MySQL提供2个视图分别展示当前持有的MDL和InnoDB行锁的阻塞关系:
sys.schema_table_lock_waits
sys.innodb_lock_waits
注意:如果MySQL中存在多个锁,但是之间互不阻塞,是不会显示在这2个视图中的。
当要查看锁源头,查询上述2个视图即可。
chema_table_lock_waits
MDL锁监控默认是关闭的,动态开启方法:
mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
建议将开启参数写入到配置文件:
performance-schema-instrument = 'wait/lock/metadata/sql/mdl=ON'
如果是被MDL阻塞,则查看sys.schema_table_lock_waits,注意下列字段含义:
waiting_pid:被阻塞的线程ID
waiting_query: 被阻塞的SQL
blocking_pid:MDL锁的线程ID,即锁源头
sql_kill_blocking_connection:杀掉锁源头,释放锁的方法
mysql> select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
object_schema: hucq
object_name: t3
waiting_thread_id: 45
waiting_pid: 11
waiting_account: root@localhost
waiting_lock_type: SHARED_WRITE
waiting_lock_duration: TRANSACTION
waiting_query: select * from t3 where c2=40 for update
waiting_query_secs: 167
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 43
blocking_pid: 9
blocking_account: root@localhost
blocking_lock_type: SHARED_READ_ONLY
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 9
sql_kill_blocking_connection: KILL 9
1 row in set (0.01 sec)
metadata_locks
sys.schema_table_lock_waits视图数据其实来自于performance_schema.metadata_locks表,只不过做了很多筛选和联接查询,结果便于阅读,但问题它只能展示表级锁(即元数据锁),而全局锁不属于元数据锁,所以我们要从原始数据中去查持有全局锁的信息:
mysql> SELECT thd.processlist_id,
mdl.OBJECT_TYPE,
mdl.OBJECT_SCHEMA,
mdl.OBJECT_NAME,
mdl.LOCK_TYPE,
mdl.LOCK_DURATION,
mdl.LOCK_STATUS,
gs.sql_text
FROM performance_schema.metadata_locks mdl
INNER JOIN performance_schema.threads thd ON mdl.owner_thread_id = thd.thread_id
AND processlist_id <> connection_id()
AND LOCK_DURATION='EXPLICIT'
JOIN performance_schema.events_statements_current gs ON mdl.owner_thread_id = gs.THREAD_ID;
+----------------+-------------+----------+---------------+-------------+------------------------------+
| processlist_id | OBJECT_TYPE | LOCK_TYPE| LOCK_DURATION | LOCK_STATUS | sql_text |
+----------------+-------------+----------+---------------+-------------+------------------------------+
| 3 | GLOBAL | SHARED | EXPLICIT | GRANTED | flush tables with read lock |
| 3 | COMMIT | SHARED | EXPLICIT | GRANTED | flush tables with read lock |
+----------------+-------------+----------+---------------+-------------+------------------------------+
2 rows in set (0.00 sec)
字段含义:
processlist_id,锁源头线程ID;
LOCK_DURATION,该值是STATEMENT,TRANSACTION或EXPLICIT中的一种。STATEMENT和TRANSACTION值分别表示在语句或事务端隐式释放的锁。 EXPLICIT值表示可以在语句或事务结束后保留并由显式操作释放的锁,例如使用带有读取锁的FLUSH TABLES WITH READ LOCK。
sql_text,锁源头线程最后执行的SQL,上述输出中因为执行 flush tables with read lock 后没有执行其他SQL,所以从查询结果中很清晰的看出是线程ID为3的这个线程执行了全局锁操作。
innodb_lock_waits
如果是被InnoDB行锁阻塞,则查看innodb_lock_waits视图:
mysql> select * from sys.innodb_lock_waits\G
*************************** 1. row ***************************
wait_started: 2020-05-12 14:15:53
wait_age: 00:00:02
wait_age_secs: 2
locked_table: `hucq`.`t3`
locked_index: c2
locked_type: RECORD
waiting_trx_id: 9626681
waiting_trx_started: 2020-05-12 14:15:03
waiting_trx_age: 00:00:52
waiting_trx_rows_locked: 3
waiting_trx_rows_modified: 0
waiting_pid: 8
waiting_query: update t3 set c2=90 where c2=40
waiting_lock_id: 9626681:301:4:3
waiting_lock_mode: X
blocking_trx_id: 9626680
blocking_pid: 11
blocking_query: NULL
blocking_lock_id: 9626680:301:4:3
blocking_lock_mode: X
blocking_trx_started: 2020-05-12 14:14:54
blocking_trx_age: 00:01:01
blocking_trx_rows_locked: 2
blocking_trx_rows_modified: 0
sql_kill_blocking_query: KILL QUERY 11
sql_kill_blocking_connection: KILL 11
1 row in set, 3 warnings (0.01 sec)
三、解决方法
如果被MDL阻塞,阻塞时间一般会很长,如果不是在做DDL,一般都可以选择杀掉那个MDL的连接,解除阻塞;
被InnoDB行锁阻塞的情况一般比较少,因为默认锁等待时间很短,而且行锁影响范围小,一般观察不到。出现的话可能也是因为人为执行了一个大事务,行锁较多,而且忘记提交事务,导致锁一直未释放,此时也可以选择kill那个连接,解除阻塞;
被全局锁阻塞,一般出现在备份的时候,可以选择退出备份,如果连接残留,一直持有全局锁,则可以kill那个下发全局锁的连接。