day12-03-Mysql优化-锁

锁的监控及处理

1 锁等待模拟

概念:
Record Lock 记录锁、行级锁、持有哪行的锁
Next Lock 下键锁 (索引键的锁)
GAP Lock  间隙锁 (索引键的锁)

X 排他锁,(增删改操作)

S 共享锁(读锁),不会阻塞其他锁  select过程中才会有
mysql 的select 使用MVCC 的快照读,很少存在加锁的情况
select * from city where countrycode='USA' lock in shard mode;(单独加锁)

意向锁,是表级锁
分类:
意向共享锁(IS):事务想要获取一张表中某几行的共享锁
意向排它锁(IX):事务想要获取一张表中的某几行的排它锁

一旦有大量的锁等待(默认60秒),mysql就会通过回滚事务释放资源。回滚意味着会产生大量IO

tx1:

USE oldboy
UPDATE t_100w SET k1='av' WHERE id=10;
## tx2:
USE oldboy 
UPDATE  t_100w SET k1='az' WHERE id=10;

锁等待会出现CPU繁忙,CPU占比可能会比较高,

比如%Cpu(s): 3.1 us, 72.3 sy, 0.0 ni, 0.0 id, 23.6 wa

这就要怀疑锁等待情况了

监控锁状态

  1. 查看锁等待******

mysql> show status like 'innodb_row_lock%';
+-------------------------------+-----------+
| Variable_name                 | Value     |
+-------------------------------+-----------+
| Innodb_row_lock_current_waits | 30        |
| Innodb_row_lock_time          | 301845756 |
| Innodb_row_lock_time_avg      | 141       |
| Innodb_row_lock_time_max      | 121822    |
| Innodb_row_lock_waits         | 2127701   |
+-------------------------------+-----------+
5 rows in set (0.01 sec)

# 主要关注 
Innodb_row_lock_current_waits  #当前有多少锁等待
Innodb_row_lock_waits	#一共发生过多少个
  1. 查看哪个事务在等待(被阻塞了)

use information_schema;
select trx_id,trx_state,trx_mysql_thread_id,trx_query  from information_schema.INNODB_TRX where trx_state='LOCK WAIT';

mysql> select trx_id,trx_state,trx_mysql_thread_id,trx_query  from information_schema.INNODB_TRX where trx_state='LOCK WAIT';
+--------+-----------+---------------------+-------------------------------------+
| trx_id | trx_state | trx_mysql_thread_id | trx_query                           |
+--------+-----------+---------------------+-------------------------------------+
| 2908   | LOCK WAIT |               10785 | update t3 set name='aaa' where id=1 |
+--------+-----------+---------------------+-------------------------------------+
1 row in set (0.00 sec)

mysql>

trx_id: 被阻塞的事务ID号(引擎层)
trx_state: 当前事务状态
trx_mysql_thread_id: 连接层的,连接线程ID(show processlist --> id 或者 trx_id) ***重点关注
trx_query: 当前被阻塞的操作(一般是要丢给开发的)
  1. 查看锁源(谁锁的我)******

select * from sys.innodb_lock_waits;  ## ====>被锁的和锁定它的之间关系

locked_table:哪张表出现的锁等待
locked_index:在哪个索引的锁
locked_type:锁的类型 (RECORD,GAP,NEXT LOCK)
waiting_trx_id: 等待的事务ID(与上个视图trx_id对应)
waiting_pid:等待事务的链接的线程ID (连接层的,连接线程ID(show processlist ---> id 或者 trx_id)  ****重点关注
waiting_query:等待事务语句
waiting_lock_mode:等待锁的类型(X,S)
blocking_trx_id:锁源的事务ID
blocking_pid:锁源的事务链接线程ID
sql_kill_blocking_connection:处理建议


mysql> select locked_table,locked_index,locked_type,waiting_trx_id,waiting_pid,waiting_query,waiting_lock_mode,blocking_trx_id,blocking_pid,sql_kill_blocking_connection from sys.innodb_lock_waits;
+---------------+--------------+-------------+----------------+-------------+-------------------------------------+-------------------+-----------------+--------------+------------------------------+
| locked_table  | locked_index | locked_type | waiting_trx_id | waiting_pid | waiting_query                       | waiting_lock_mode | blocking_trx_id | blocking_pid | sql_kill_blocking_connection |
+---------------+--------------+-------------+----------------+-------------+-------------------------------------+-------------------+-----------------+--------------+------------------------------+
| `taobao`.`t3` | PRIMARY      | RECORD      | 2909           |       10785 | update t3 set name='aaa' where id=1 | X                 | 2907            |        10756 | KILL 10756                   |
+---------------+--------------+-------------+----------------+-------------+-------------------------------------+-------------------+-----------------+--------------+------------------------------+
1 row in set, 3 warnings (0.00 sec)

mysql>   
  1. 根据锁源的pid,找到锁源的线程ID(thread_id)******

select * from performance_schema.threads 
where processlist_id = 13825;
===> 41

performance_schema.threads 下需要关注的字段:
processlist_id # 锁源的事务链接PID (show processlist)
thread_id # PID分配负责执行SQL语句的线程ID,与PID是父子关系
NAME # 线程名称

processlist_id = sys.innodb_lock_waits.blocking_pid


mysql> select thread_id,NAME,processlist_id from performance_schema.threads where processlist_id=13825;
+-----------+---------------------------+----------------+
| thread_id | NAME                      | processlist_id |
+-----------+---------------------------+----------------+
|     13850 | thread/sql/one_connection |          13825 |
+-----------+---------------------------+----------------+
1 row in set (0.00 sec)

mysql> 

# 查到 负责执行SQL的线程ID(thread_id)是 13850
  1. 找到锁源的SQL语句******

    performance_schema.events_statements_current	下需要关注的字段:
    
    thread_id	#执行SQL的线程ID
    event_name  #事件名称
    lock_time	#锁定时间
    sql_text	#线程执行的SQL语句内容
    
    1. 当前在执行的语句
      select 
      thread_id,event_name,
      lock_time,sql_text
      from performance_schema.events_statements_current
      where thread_id=13850;
      
      mysql> select 
          -> thread_id,event_name,
          -> lock_time,sql_text
          -> from performance_schema.events_statements_current
          -> where thread_id=13850;
      +-----------+----------------------+-----------+------------------------------------+
      | thread_id | event_name           | lock_time | sql_text                           |
      +-----------+----------------------+-----------+------------------------------------+
      |     13850 | statement/sql/update | 137000000 | update t3 set name='aa' where id=1 |
      +-----------+----------------------+-----------+------------------------------------+
      1 row in set (0.00 sec)
      
      mysql>
      
    2. 执行语句的历史
      select 
      thread_id,event_name,
      lock_time,sql_text
      from performance_schema.events_statements_history
      where thread_id=13850;
      
      
      mysql> select 
          -> thread_id,event_name,
          -> lock_time,sql_text
          -> from performance_schema.events_statements_history
          -> where thread_id=13850;
      +-----------+------------------------------+-----------+------------------------------------+
      | thread_id | event_name                   | lock_time | sql_text                           |
      +-----------+------------------------------+-----------+------------------------------------+
      |     13850 | statement/sql/update         | 137000000 | update t3 set name='aa' where id=1 |
      |     13850 | statement/sql/select         |         0 | SELECT DATABASE()                  |
      |     13850 | statement/com/Init DB        |         0 | NULL                               |
      |     13850 | statement/sql/show_databases | 100000000 | show databases                     |
      |     13850 | statement/sql/show_tables    |  55000000 | show tables                        |
      |     13850 | statement/com/Field List     |         0 | NULL                               |
      |     13850 | statement/com/Field List     |         0 | NULL                               |
      |     13850 | statement/com/Field List     |         0 | NULL                               |
      |     13850 | statement/com/Field List     |         0 | NULL                               |
      |     13850 | statement/sql/begin          |         0 | begin                              |
      +-----------+------------------------------+-----------+------------------------------------+
      10 rows in set (0.00 sec)
      
      mysql> 
      mysql>
      
      得出结果,丢给开发
      表信息 
      被阻塞的
      锁源SQL
      
      练习:
      一键获得以上信息,请写出具体的SQL语句
      
  2. 优化项目:锁的监控及处理

    1. 背景: 
    硬件环境: DELL R720,E系列16核,48G MEM,SAS*900G*6,RAID10
    在例行巡检时,发现9-11点时间段的CPU压力非常高(80-90%)
    
    2. 项目的职责
        2.1 通过top详细排查,发现mysqld进程占比达到了700-800%
        2.2 其中有量的CPU是被用作的SYS和WAIT,us处于正常
        2.3 怀疑是MySQL 锁 或者SQL语句出了问题
        2.4 经过排查slowlog及锁等待情况,发现有大量锁等待及少量慢语句    
        (1) pt-query-diagest 查看慢日志  
        (2) 锁等待有没有?
        db03 [(none)]>show status like 'innodb_row_lock%';
        +-------------------------------+-------+
        | Variable_name                 | Value |
        +-------------------------------+-------+
        | Innodb_row_lock_current_waits | 0     |
        | Innodb_row_lock_time          | 0     |
        | Innodb_row_lock_time_avg      | 0     |
        | Innodb_row_lock_time_max      | 0     |
        | Innodb_row_lock_waits         | 0     |
        +-------------------------------+-------+
        情况一:
                有100多个current_waits,说明当前很多锁等待情况
        情况二:
                1000多个lock_waits,说明历史上发生过的锁等待很多
        2.5 查看那个事务在等待(被阻塞了)
        2.6 查看锁源事务信息(谁锁的我)
        2.7 找到锁源的thread_id 
        2.8 找到锁源的SQL语句
    3. 找到语句之后,和应用开发人员进行协商   
        (1)
        开发人员描述,此语句是事务挂起导致
        我们提出建议是临时kill 会话,最终解决问题
        (2) 
        开发人员查看后,发现是业务逻辑问题导致的死锁,产生了大量锁等待
        临时解决方案,将阻塞事务的会话kill掉.
        最终解决方案,修改代码中的业务逻辑
    项目结果:
        经过排查处理,锁等待的个数减少80%.解决了CPU持续峰值的问题.
        
    锁监控设计到的命令:
    show status like 'innodb_rows_lock%'
    select * from information_schema.innodb_trx;
    select * from sys.innodb_lock_waits;
    select * from performance_schema.threads;
    select * from performance_schema.events_statements_current;
    select * from performance_schema.events_statements_history;
    
  3. 死锁监控

    # 查看死锁监控
    show engine innodb status\G
    
    # 查看死锁配置参数
    mysql> show variables like '%deadlock%';
    +----------------------------+-------+
    | Variable_name              | Value |
    +----------------------------+-------+
    | innodb_deadlock_detect     | ON    |
    | innodb_print_all_deadlocks | OFF   |
    +----------------------------+-------+
    2 rows in set (0.00 sec)
    
    mysql>
    
    # 还可以改配置文件,把死锁信息输入到错误日志里。
    vim /etc/my.cnf 
    innodb_print_all_deadlocks = 1  
    
  4. 主从优化

    ## 5.7 从库多线程复制 Multi-threaded slave  MTS
    基本要求:
    5.7以上的版本(忘记小版本)
    必须开启GTID 
    binlog必须是row模式  
    
    # 开启GTID
    gtid_mode=ON
    enforce_gtid_consistency=ON
    log_slave_updates=ON
    
    # Slave Muliti SQL_Thread
    # LOGICAL_CLOCK 本地时钟,基于事务级别的SQL并发回放,必须开启GTID
    mysql> show variables like 'slave_parallel_type';
    +---------------------+----------+
    | Variable_name       | Value    |
    +---------------------+----------+
    | slave_parallel_type | DATABASE |
    +---------------------+----------+
    1 row in set (0.00 sec)
    
    mysql> select @@slave_parallel_type;
    +-----------------------+
    | @@slave_parallel_type |
    +-----------------------+
    | DATABASE              |
    +-----------------------+
    1 row in set (0.00 sec)
    
    mysql>
    
    # slave_parallel_type 默认是 DATABASE,必须改成LOGICAL_CLOCK
    slave-parallel-type=LOGICAL_CLOCK
    
    slave-parallel-workers=16	#并发回放的线程数,一般是CPU核心个数的50%
    
    # master_info 和 relay_log_info 改成以表的方式存储,提高性能。默认是文件形式存储
    master_info_repository=TABLE
    relay_log_info_repository=TABLE
    
    # 开启relay_log 恢复功能(binlog出现问题,可以通过relog做一些问题恢复)
    relay_log_recovery=ON
    
    5.7 :
    slave-parallel-type=LOGICAL_CLOCK
    slave-parallel-workers=8
    cpu核心数作为标准
    
    CHANGE MASTER TO
      MASTER_HOST='10.0.0.128',
      MASTER_USER='repl',
      MASTER_PASSWORD='123',
      MASTER_PORT=3307,
      MASTER_AUTO_POSITION=1;
    start slave;
    
posted @ 2022-11-24 20:33  oldSimon  阅读(30)  评论(0)    收藏  举报