查看: 241|回复: 0

[Mysql数据库] Mysql:从一个USE DB堵塞故障展开的探讨

发表于 7 天前
太阳http代理AD

原创水平有限如有误请指出谢谢!


一、故障描述

今天一个朋友遇到数据库遇到一个严重的故障,故障环境如下:

表现如下:

情急之下他杀掉了一大堆线程后发现还是不能恢复,最后杀掉了一个没有及时提交的事物才恢复正常。也仅仅留下了如下图的一个截图:


image.png
image.png 二、故障信息提取

还是回到上图,我们可以归纳一下语句类型如下:

三、信息分析

要分析出这个案列其实不太容易因为他是MYSQL层MDL LOCK和RR模式innodb row lock的一个综合案列,并且我们要对schema.processlist的STATE比较敏感才行。
建议先阅读我的如下文章来学习MDL LOCK:
http://blog.itpub.net/7728585/viewspace-2143093/

本节关于MDL LOCK的验证使用下面两种方式:

  1. UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
  2. UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME ='wait/lock/metadata/sql/mdl';
  3. select * from performance_schema.metadata_locks\G
复制代码
1、关于CREATE TABLE A AS SELECT B 对B表sending data的分析

关于sending data这个状态其实可以代表很多含义,从我现有的对的了解,这是MYSQL上层对SELECT类型语句的这类语句在INNODB层和MYSQL层进行数据交互的时候一个统称,所以出现它的可能包含:

同时我们还需要注意在RR模式下SELECT B这一部分加锁方式和INSERT...SELECT是一致的参考不在熬述:
http://blog.itpub.net/7728585/viewspace-2146183/
从他反应的情况因为他在最后杀掉了一个长期的未提交的事物所以他因为是情况2。并且整个CREATE TABLE A AS SELECT B语句由于B表上某些数据库被上了锁而不能获取,导致整个语句处于sending data状态下。

2、关于SHOW TABLE STATUS[like 'A'] Waiting for table metadata lock的分析

这是本案例中最重要的一环,SHOW TABLE STATUS[like 'A']居然被堵塞其STATE为Waiting for table metadata lock并且注意这里是table因为MDL LOCK类型分为很多。我在MDL介绍的那篇文章中提到了desc 一个表的时候会上MDL_SHARED_HIGH_PRIO(SH),其实在SHOW TABLE STATUS的时候也会对本表上MDL_SHARED_HIGH_PRIO(SH)。

  1. mysql> SHOW TABLE STATUS like 'a' \G
  2. 2017-11-10T03:01:48.142334Z 6 [Note] (acquire_lock)**THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!**
  3. 2017-11-10T03:01:48.142381Z 6 [Note] (>MDL PRINT) Thread id is 6:
  4. 2017-11-10T03:01:48.142396Z 6 [Note] (->MDL PRINT) DB_name is:test
  5. 2017-11-10T03:01:48.142409Z 6 [Note] (-->MDL PRINT) OBJ_name is:a
  6. 2017-11-10T03:01:48.142421Z 6 [Note] (--->MDL PRINT) Namespace is:TABLE
  7. 2017-11-10T03:01:48.142434Z 6 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_HIGH_PRIO(SH)
  8. 2017-11-10T03:01:48.142447Z 6 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
复制代码
  1. *************************** 7. row ***************************
  2. OBJECT_TYPE: TABLE
  3. OBJECT_SCHEMA: test
  4. OBJECT_NAME: a
  5. OBJECT_INSTANCE_BEGIN: 140733864665152
  6. LOCK_TYPE: SHARED_HIGH_PRIO
  7. LOCK_DURATION: TRANSACTION
  8. LOCK_STATUS: PENDING
  9. SOURCE: sql_base.cc:2821
  10. OWNER_THREAD_ID: 38
  11. OWNER_EVENT_ID: 1695
复制代码

两种方式都能观察到MDL_SHARED_HIGH_PRIO(SH)的存在并且我模拟的是处于堵塞情况下的。
但是MDL_SHARED_HIGH_PRIO(SH) 是一个优先级非常高的一个MDL LOCK类型表现如下:

  1. Request | Granted requests for lock |
  2. type | S SH SR SW SWLP SU SRO SNW SNRW X |
  3. ----------+---------------------------------------------+
  4. SH | + + + + + + + + + - |
复制代码
  1. Request | Pending requests for lock |
  2. type | S SH SR SW SU SNW SNRW X |
  3. ----------+---------------------------------+
  4. SH | + + + + + + + + |
复制代码

其被堵塞的条件除了被MDL_EXCLUSIVE(X)堵塞没有其他的可能。那么这就是一个非常重要的突破口。

3、关于CREATE TABLE A AS SELECT B 对A表的加MDL LOCK的分析

这一点也是我以前不知道的,也是本案列中花时间最多的地方,前文已经分析过要让SHOW TABLE STATUS[like 'A']这种只会上MDL_SHARED_HIGH_PRIO(SH) MDL LOCK的语句堵塞在MDL LOCK上只有一种可能那就是A表上了MDL_EXCLUSIVE(X)。那么我开始
怀疑这个DDL语句在语句结束之前会对A表上MDL_EXCLUSIVE(X) ,然后进行实际测试不出所料确实是这样的如下:

  1. 2017-11-10T05:38:16.824713Z 4 [Note] (acquire_lock)THIS MDL LOCK acquire ok!
  2. 2017-11-10T05:38:16.824727Z 4 [Note] (>MDL PRINT) Thread id is 4:
  3. 2017-11-10T05:38:16.824739Z 4 [Note] (->MDL PRINT) DB_name is:test
  4. 2017-11-10T05:38:16.824752Z 4 [Note] (-->MDL PRINT) OBJ_name is:a
  5. 2017-11-10T05:38:16.824764Z 4 [Note] (--->MDL PRINT) Namespace is:TABLE
  6. 2017-11-10T05:38:16.824776Z 4 [Note] (---->MDL PRINT) Fast path is:(Y)
  7. 2017-11-10T05:38:16.824788Z 4 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED(S)
  8. 2017-11-10T05:38:16.824799Z 4 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
  9. 2017-11-10T05:38:16.825286Z 4 [Note] (upgrade_shared_lock)THIS MDL LOCK upgrade TO
  10. 2017-11-10T05:38:16.825312Z 4 [Note] (>MDL PRINT) Thread id is 4:
  11. 2017-11-10T05:38:16.825332Z 4 [Note] (->MDL PRINT) DB_name is:test
  12. 2017-11-10T05:38:16.825345Z 4 [Note] (-->MDL PRINT) OBJ_name is:a
  13. 2017-11-10T05:38:16.825357Z 4 [Note] (--->MDL PRINT) Namespace is:TABLE
  14. 2017-11-10T05:38:16.825369Z 4 [Note] (----->MDL PRINT) Mdl type is:MDL_EXCLUSIVE(X)
  15. 2017-11-10T05:38:16.825381Z 4 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
复制代码
  1. *************************** 1. row ***************************
  2. OBJECT_TYPE: TABLE
  3. OBJECT_SCHEMA: test
  4. OBJECT_NAME: a
  5. OBJECT_INSTANCE_BEGIN: 140733998842016
  6. LOCK_TYPE: SHARED
  7. LOCK_DURATION: TRANSACTION
  8. LOCK_STATUS: GRANTED
  9. SOURCE: sql_parse.cc:6314
  10. OWNER_THREAD_ID: 36
  11. OWNER_EVENT_ID: 1553
复制代码

这里比较遗憾在performance_schema.metadata_locks中并没有显示出MDL_EXCLUSIVE(X),而显示为MDL_SHARED(S) 但是我们在我输出的日志中可以看到这里做了升级操作将MDL_SHARED(S) 升级为了MDL_EXCLUSIVE(X)。并且由前面的兼容性列表来看,只有MDL_EXCLUSIVE(X)会堵塞MDL_SHARED_HIGH_PRIO(SH)。所以我们应该能够确认这里确实做了升级操作,否则SHOW TABLE STATUS[like 'A'] 是不会被堵塞的。

4、关于SELECT * FROM A Waiting for table metadata lock的分析

也许大家认为SELECT不会上锁,但是那是在innodb 层次,在MYSQL层会上MDL_SHARED_READ(SR) 如下:

  1. select * from a;
  2. 2017-11-10T03:31:31.209772Z 6 [Note] (acquire_lock)THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!
  3. 2017-11-10T03:31:31.209824Z 6 [Note] (>MDL PRINT) Thread id is 6:
  4. 2017-11-10T03:31:31.209851Z 6 [Note] (->MDL PRINT) DB_name is:test
  5. 2017-11-10T03:31:31.209870Z 6 [Note] (-->MDL PRINT) OBJ_name is:a
  6. 2017-11-10T03:31:31.209885Z 6 [Note] (--->MDL PRINT) Namespace is:TABLE
  7. 2017-11-10T03:31:31.209965Z 6 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_READ(SR)
  8. 2017-11-10T03:31:31.209985Z 6 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
复制代码
  1. OBJECT_TYPE: TABLE
  2. OBJECT_SCHEMA: test
  3. OBJECT_NAME: a
  4. OBJECT_INSTANCE_BEGIN: 140733864625136
  5. LOCK_TYPE: SHARED_READ
  6. LOCK_DURATION: TRANSACTION
  7. LOCK_STATUS: PENDING
  8. SOURCE: sql_parse.cc:6314
  9. OWNER_THREAD_ID: 38
  10. OWNER_EVENT_ID: 1764
复制代码

可以看到确实有MDL_SHARED_READ(SR)的存在,当前处于堵塞状态

其兼容性如下:

  1. Request | Granted requests for lock |
  2. type | S SH SR SW SWLP SU SRO SNW SNRW X |
  3. ----------+---------------------------------------------+
  4. SR | + + + + + + + + - - |
复制代码

显然MDL_SHARED_READ(SR) 和MDL_SHARED_HIGH_PRIO(SH)是不兼容的需要等待。

5、关于DROP TABLE A Waiting for table metadata lock的分析

这一点很好分析因为A表上了X锁而DROP TABLE A必然上MDL_EXCLUSIVE(X)锁它当然和MDL_EXCLUSIVE(X)不兼容。如下:

  1. drop table a;
  2. 2017-11-09T10:58:28.673015Z 3 [Note] (acquire_lock)THIS MDL LOCK acquire ok!
  3. 2017-11-09T10:58:28.673030Z 3 [Note] (>MDL PRINT) Thread id is 3:
  4. 2017-11-09T10:58:28.673042Z 3 [Note] (->MDL PRINT) DB_name is:test
  5. 2017-11-09T10:58:28.673054Z 3 [Note] (-->MDL PRINT) OBJ_name is:t10
  6. 2017-11-09T10:58:28.673067Z 3 [Note] (--->MDL PRINT) Namespace is:TABLE
  7. 2017-11-09T10:58:28.673094Z 3 [Note] (----->MDL PRINT) Mdl type is:MDL_EXCLUSIVE(X)
  8. 2017-11-09T10:58:28.673109Z 3 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
复制代码
  1. OBJECT_TYPE: TABLE
  2. OBJECT_SCHEMA: test
  3. OBJECT_NAME: a
  4. OBJECT_INSTANCE_BEGIN: 140733864625472
  5. LOCK_TYPE: EXCLUSIVE
  6. LOCK_DURATION: TRANSACTION
  7. LOCK_STATUS: PENDING
  8. SOURCE: sql_parse.cc:6314
  9. OWNER_THREAD_ID: 38
  10. OWNER_EVENT_ID: 1832
复制代码

其中EXCLUSIVE就是我们说的MDL_EXCLUSIVE(X)它确实存在当前处于堵塞

6、为何use db也会堵塞?

如果使用mysql客户端不使用-A选项(或者 no-auto-rehash)在USE DB的时候至少要做如下事情:

1、 对db下每个表上MDL (SH) lock如下(调用MDL_context::acquire_lock 这里给出堵塞时候的信息):
  1. use test
  2. 2017-11-10T03:46:50.223628Z 5 [Note] (acquire_lock)THIS MDL LOCK acquire WAIT(MDL_LOCK WAIT QUE)!
  3. 2017-11-10T03:46:50.223666Z 5 [Note] (>MDL PRINT) Thread id is 5:
  4. 2017-11-10T03:46:50.223696Z 5 [Note] (->MDL PRINT) DB_name is:test
  5. 2017-11-10T03:46:50.223714Z 5 [Note] (-->MDL PRINT) OBJ_name is:a
  6. 2017-11-10T03:46:50.223725Z 5 [Note] (--->MDL PRINT) Namespace is:TABLE
  7. 2017-11-10T03:46:50.223735Z 5 [Note] (----->MDL PRINT) Mdl type is:MDL_SHARED_HIGH_PRIO(SH)
  8. 2017-11-10T03:46:50.223755Z 5 [Note] (------>MDL PRINT) Mdl duration is:MDL_TRANSACTION
复制代码
  1. *************************** 7. row ***************************
  2. OBJECT_TYPE: TABLE
  3. OBJECT_SCHEMA: test
  4. OBJECT_NAME: a
  5. OBJECT_INSTANCE_BEGIN: 140733797429008
  6. LOCK_TYPE: SHARED_HIGH_PRIO
  7. LOCK_DURATION: TRANSACTION
  8. LOCK_STATUS: PENDING
  9. SOURCE: sql_base.cc:2821
  10. OWNER_THREAD_ID: 37
  11. OWNER_EVENT_ID: 187
复制代码

可以看到USE DB确实也因为MDL_SHARED_HIGH_PRIO(SH) 发生了堵塞。

2、对每个表加入到table cache,并且打开表(调用open_table_from_share())

那么这种情况就和SHOW TABLE STATUS[like 'A']被堵塞的情况一模一样了,也是由于MDL 锁不兼容造成的。

四、分析梳理

有了前面的分析那么我们可以梳理这个故障发生的原因如下:

五、模拟测试

测试环境:

  1. create table b (id int);
  2. insert into b values(1);
  3. set global innodb_lock_wait_timeout=1000;
  4. UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
  5. UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME ='wait/lock/metadata/sql/mdl';
  6. select * from performance_schema.metadata_locks\G
  7. (请重新连接让参数生效)
复制代码

步骤如下:

session1 session2 session3 session4
--- --- use test; ---
use test;begin; delete from b; --- --- ---
--- use test;create table a asselect * from b;(由于b表innodb row lock堵塞) --- ---
--- --- show table status like 'a';(由于a表MDL LOCK堵塞) ---
--- --- --- use test(由于a表MDL LOCK堵塞)

最后我们看到的等待状态如下:

  1. mysql> select id,COMMAND,STATE, INFO,TIME from information_schema.processlist;
  2. +----+------------+---------------------------------+------------------------------------------------------------------------+------+
  3. | id | COMMAND | STATE | INFO | TIME |
  4. +----+------------+---------------------------------+------------------------------------------------------------------------+------+
  5. | 9 | Query | executing | select id,COMMAND,STATE, INFO,TIME from information_schema.processlist | 0 |
  6. | 7 | Query | Sending data | create table a as select * from b | 20 |
  7. | 10 | Field List | Waiting for table metadata lock | | 12 |
  8. | 5 | Sleep | | NULL | 171 |
  9. | 6 | Query | Waiting for table metadata lock | show table status like 'a' | 18 |
  10. +----+------------+---------------------------------+------------------------------------------------------------------------+------+
复制代码

这样我们就完美的模拟出线上的状态,如果我们杀掉session1中的事物,自然就全部解锁了,同事我们看一下performance_schema.metadata_locks中的输出:

  1. mysql> SELECT * FROM performance_schema.metadata_locks where object_name='a' \G
  2. *************************** 1. row ***************************
  3. OBJECT_TYPE: TABLE
  4. OBJECT_SCHEMA: test
  5. OBJECT_NAME: a
  6. OBJECT_INSTANCE_BEGIN: 140733999179328
  7. LOCK_TYPE: SHARED
  8. LOCK_DURATION: TRANSACTION
  9. LOCK_STATUS: GRANTED
  10. SOURCE: sql_parse.cc:6314
  11. OWNER_THREAD_ID: 40
  12. OWNER_EVENT_ID: 1615
  13. *************************** 2. row ***************************
  14. OBJECT_TYPE: TABLE
  15. OBJECT_SCHEMA: test
  16. OBJECT_NAME: a
  17. OBJECT_INSTANCE_BEGIN: 140733663338832
  18. LOCK_TYPE: SHARED_HIGH_PRIO
  19. LOCK_DURATION: TRANSACTION
  20. LOCK_STATUS: PENDING
  21. SOURCE: sql_base.cc:2821
  22. OWNER_THREAD_ID: 41
  23. OWNER_EVENT_ID: 1613
  24. *************************** 3. row ***************************
  25. OBJECT_TYPE: TABLE
  26. OBJECT_SCHEMA: test
  27. OBJECT_NAME: a
  28. OBJECT_INSTANCE_BEGIN: 140733797433200
  29. LOCK_TYPE: SHARED_HIGH_PRIO
  30. LOCK_DURATION: TRANSACTION
  31. LOCK_STATUS: PENDING
  32. SOURCE: sql_base.cc:2821
  33. OWNER_THREAD_ID: 42
  34. OWNER_EVENT_ID: 184
复制代码

我们可以看到如上的输出,但是需要注意LOCK_TYPE: SHARED它不可能堵塞LOCK_TYPE: SHARED_HIGH_PRIO(可以参考附录或者我以前写的MDL LOCK分析的文章)如上文分析这里实际上是做了升级操作升级为了MDL_EXCLUSIVE(X)。

六、结语 七、附录
  1. MDL_INTENTION_EXCLUSIVE(IX)
  2. MDL_SHARED(S)
  3. MDL_SHARED_HIGH_PRIO(SH)
  4. MDL_SHARED_READ(SR)
  5. MDL_SHARED_WRITE(SW)
  6. MDL_SHARED_WRITE_LOW_PRIO(SWL)
  7. MDL_SHARED_UPGRADABLE(SU)
  8. MDL_SHARED_READ_ONLY(SRO)
  9. MDL_SHARED_NO_WRITE(SNW)
  10. MDL_SHARED_NO_READ_WRITE(SNRW)
  11. MDL_EXCLUSIVE(X)
复制代码
  1. Request | Granted requests for lock |
  2. type | S SH SR SW SWLP SU SRO SNW SNRW X |
  3. ----------+---------------------------------------------+
  4. S | + + + + + + + + + - |
  5. SH | + + + + + + + + + - |
  6. SR | + + + + + + + + - - |
  7. SW | + + + + + + - - - - |
  8. SWLP | + + + + + + - - - - |
  9. SU | + + + + + - + - - - |
  10. SRO | + + + - - + + + - - |
  11. SNW | + + + - - - + - - - |
  12. SNRW | + + - - - - - - - - |
  13. X | - - - - - - - - - - |
复制代码
  1. A priority matrice specified by it looks like:
  2. Request | Pending requests for lock |
  3. type | S SH SR SW SWLP SU SRO SNW SNRW X |
  4. ----------+--------------------------------------------+
  5. S | + + + + + + + + + - |
  6. SH | + + + + + + + + + + |
  7. SR | + + + + + + + + - - |
  8. SW | + + + + + + + - - - |
  9. SWLP | + + + + + + - - - - |
  10. SU | + + + + + + + + + - |
  11. SRO | + + + - + + + + - - |
  12. SNW | + + + + + + + + + - |
  13. SNRW | + + + + + + + + + - |
  14. X | + + + + + + + + + + |
复制代码

作者微信:


微信.jpg




太阳http代理AD
回复

使用道具 举报