锁阻塞问题排查

锁阻塞问题排查
如果我们执行一个SQL被阻塞了,并且初步判断是被锁阻塞的,那么原因可能为:
1.被InnoDB锁阻塞;
2.被server层的表级锁,或者全局锁阻塞。


一、查看线程状态
不管是被谁阻塞了,都先执行”show processlist;”查看线程状态,阻塞原因不同,看到的状态也不一样,下面分别举例说明。

第1种情况:被元数据锁阻塞了”Waiting for table metadata lock” 。
前面我们说的server层的表级锁,比如执行”lock table t1 read;”其实获取的就是元数据锁,也就是常说的metadata lock,简称为MDL。但是这样我们只能看到被阻塞的线程,要找到阻塞的源头,需要看下一步方法。

mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------
| Id | User | Host      | db   | Command | Time | State                           | Info                                    
+----+------+-----------+------+---------+------+---------------------------------+------------------------
|  8 | root | localhost | hucq | Sleep   |   40 |                                 | NULL                                    
|  9 | root | localhost | hucq | Sleep   |  270 |                                 | NULL                                    
| 10 | root | localhost | sys  | Query   |    0 | starting                        | show processlist                        
| 11 | root | localhost | hucq | Query   |   15 | Waiting for table metadata lock | select * from t3 where c2=40 for update |
+----+------+-----------+------+---------+------+---------------------------------+------------------------
4 rows in set (0.00 sec)


第2种情况:被全局锁阻塞了”Waiting for global read lock”。
最常见的Xtrabackup备份时最后拷贝表结构和非事务引擎表时会执行”flush tables with read”加全局锁(注意:不是表级锁),全局锁定期间,阻塞写操作,允许读操作:
mysql> show processlist;
+----+------+-----------+------+---------+------+------------------------------+---------------------------
| Id | User | Host      | db   | Command | Time | State                        | Info                           
+----+------+-----------+------+---------+------+------------------------------+--------------------------
|  8 | root | localhost | hucq | Query   |   40 | Waiting for table flush      | flush tables with read lock    |
|  9 | root | localhost | hucq | Query   |   10 | Waiting for global read lock | update t3 set c2=1 where c2=20 |
| 10 | root | localhost | sys  | Query   |    0 | starting                     | show processlist               
| 12 | root | localhost | hucq | Query   |   43 | User sleep                   | select *,sleep(30) from t3     
+----+------+-----------+------+---------+------+------------------------------+---------------------------


上面一种情况是本身 flush tables with read lock 加全局锁的操作被阻塞,后续更新操作也跟着被阻塞,所以可以看到锁源头。
但是如果加全局锁成功后,更新操作被阻塞时,show processlist 是看不到锁源头的,需要看下一步方法:
mysql> show processlist;
+----+------+-----------+------+---------+------+------------------------------+---------------------------
| Id | User | Host      | db   | Command | Time | State                        | Info                         
+----+------+-----------+------+---------+------+------------------------------+---------------------------
|  2 | root | localhost | hucq | Query   |    7 | Waiting for global read lock | insert into t3 values(70,70) |
|  3 | root | localhost | hucq | Sleep   |   13 |                              | NULL                         
|  4 | root | localhost | NULL | Query   |    0 | starting                     | show processlist             
|  9 | root | localhost | hucq | Sleep   |   35 |                              | NULL                         
+----+------+-----------+------+---------+------+------------------------------+---------------------------



正常状态
第3种情况,SQL虽然被阻塞了,但是状态并不显示有lock,这个通常就是被InnoDB锁阻塞了,不太容易被发现:
mysql> show processlist;
+----+------+-----------+------+---------+------+----------+---------------------------------+
| Id | User | Host      | db   | Command | Time | State    | Info                            |
+----+------+-----------+------+---------+------+----------+---------------------------------+
|  8 | root | localhost | hucq | Query   |    5 | updating | update t3 set c2=90 where c2=40 |
| 10 | root | localhost | sys  | Query   |    0 | starting | show processlist                |
| 11 | root | localhost | hucq | Sleep   |  743 |          | NULL                            |
+----+------+-----------+------+---------+------+----------+---------------------------------+
4 rows in set (0.00 sec)




二、查看锁阻塞视图
上面我们看到的被阻塞SQL的状态,接下来需要找到阻塞的源头,这样我们才能解决阻塞的问题。MySQL提供2个视图分别展示当前持有的MDL和InnoDB行锁的阻塞关系:
sys.schema_table_lock_waits
sys.innodb_lock_waits

注意:如果MySQL中存在多个锁,但是之间互不阻塞,是不会显示在这2个视图中的。
当要查看锁源头,查询上述2个视图即可。


chema_table_lock_waits
MDL锁监控默认是关闭的,动态开启方法:
mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';

建议将开启参数写入到配置文件:
performance-schema-instrument = 'wait/lock/metadata/sql/mdl=ON'

如果是被MDL阻塞,则查看sys.schema_table_lock_waits,注意下列字段含义:
waiting_pid:被阻塞的线程ID
waiting_query: 被阻塞的SQL
blocking_pid:MDL锁的线程ID,即锁源头
sql_kill_blocking_connection:杀掉锁源头,释放锁的方法

mysql> select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
               object_schema: hucq
                 object_name: t3
           waiting_thread_id: 45
                 waiting_pid: 11
             waiting_account: root@localhost
           waiting_lock_type: SHARED_WRITE
       waiting_lock_duration: TRANSACTION
               waiting_query: select * from t3 where c2=40 for update
          waiting_query_secs: 167
 waiting_query_rows_affected: 0
 waiting_query_rows_examined: 0
          blocking_thread_id: 43
                blocking_pid: 9
            blocking_account: root@localhost
          blocking_lock_type: SHARED_READ_ONLY
      blocking_lock_duration: TRANSACTION
     sql_kill_blocking_query: KILL QUERY 9
sql_kill_blocking_connection: KILL 9
1    row in set (0.01 sec)


metadata_locks
sys.schema_table_lock_waits视图数据其实来自于performance_schema.metadata_locks表,只不过做了很多筛选和联接查询,结果便于阅读,但问题它只能展示表级锁(即元数据锁),而全局锁不属于元数据锁,所以我们要从原始数据中去查持有全局锁的信息:
mysql> SELECT thd.processlist_id,
       mdl.OBJECT_TYPE,
       mdl.OBJECT_SCHEMA,
       mdl.OBJECT_NAME,
       mdl.LOCK_TYPE,
       mdl.LOCK_DURATION,
       mdl.LOCK_STATUS,
       gs.sql_text
FROM performance_schema.metadata_locks mdl
INNER JOIN performance_schema.threads thd ON mdl.owner_thread_id = thd.thread_id
AND processlist_id <> connection_id()
AND LOCK_DURATION='EXPLICIT'
JOIN performance_schema.events_statements_current gs ON mdl.owner_thread_id = gs.THREAD_ID;
+----------------+-------------+----------+---------------+-------------+------------------------------+
| processlist_id | OBJECT_TYPE | LOCK_TYPE| LOCK_DURATION | LOCK_STATUS | sql_text                     |
+----------------+-------------+----------+---------------+-------------+------------------------------+
|              3 | GLOBAL      | SHARED   | EXPLICIT      | GRANTED     | flush tables with read lock  |
|              3 | COMMIT      | SHARED   | EXPLICIT      | GRANTED     | flush tables with read lock  |
+----------------+-------------+----------+---------------+-------------+------------------------------+
2 rows in set (0.00 sec)

字段含义:
processlist_id,锁源头线程ID;
LOCK_DURATION,该值是STATEMENT,TRANSACTION或EXPLICIT中的一种。STATEMENT和TRANSACTION值分别表示在语句或事务端隐式释放的锁。 EXPLICIT值表示可以在语句或事务结束后保留并由显式操作释放的锁,例如使用带有读取锁的FLUSH TABLES WITH READ LOCK。
sql_text,锁源头线程最后执行的SQL,上述输出中因为执行 flush tables with read lock 后没有执行其他SQL,所以从查询结果中很清晰的看出是线程ID为3的这个线程执行了全局锁操作。


innodb_lock_waits
如果是被InnoDB行锁阻塞,则查看innodb_lock_waits视图:
mysql> select * from sys.innodb_lock_waits\G
*************************** 1. row ***************************
                wait_started: 2020-05-12 14:15:53
                    wait_age: 00:00:02
               wait_age_secs: 2
                locked_table: `hucq`.`t3`
                locked_index: c2
                 locked_type: RECORD
              waiting_trx_id: 9626681
         waiting_trx_started: 2020-05-12 14:15:03
             waiting_trx_age: 00:00:52
     waiting_trx_rows_locked: 3
   waiting_trx_rows_modified: 0
                 waiting_pid: 8
               waiting_query: update t3 set c2=90 where c2=40
             waiting_lock_id: 9626681:301:4:3
           waiting_lock_mode: X
             blocking_trx_id: 9626680
                blocking_pid: 11
              blocking_query: NULL
            blocking_lock_id: 9626680:301:4:3
          blocking_lock_mode: X
        blocking_trx_started: 2020-05-12 14:14:54
            blocking_trx_age: 00:01:01
    blocking_trx_rows_locked: 2
  blocking_trx_rows_modified: 0
     sql_kill_blocking_query: KILL QUERY 11
sql_kill_blocking_connection: KILL 11
1    row in set, 3 warnings (0.01 sec)



三、解决方法
如果被MDL阻塞,阻塞时间一般会很长,如果不是在做DDL,一般都可以选择杀掉那个MDL的连接,解除阻塞;
被InnoDB行锁阻塞的情况一般比较少,因为默认锁等待时间很短,而且行锁影响范围小,一般观察不到。出现的话可能也是因为人为执行了一个大事务,行锁较多,而且忘记提交事务,导致锁一直未释放,此时也可以选择kill那个连接,解除阻塞;
被全局锁阻塞,一般出现在备份的时候,可以选择退出备份,如果连接残留,一直持有全局锁,则可以kill那个下发全局锁的连接。

 

posted @ 2020-08-14 14:32  屠魔的少年  阅读(170)  评论(0)    收藏  举报