当前位置:Gxlcms > 数据库问题 > MySql一个生产死锁案例分析

MySql一个生产死锁案例分析

时间:2021-07-01 10:21:17 帮助过:7人阅读

tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1131766 and capital >= 1.0 事务2.update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1164727 and capital >= 1.0

2.两条SQL都使用了Record lock(locks rec but not gap)记录锁(非间隙所)

通过查看表结构定义得知tb_elc_forecast将id字段定义为主键,表结构如下(精简版):

CREATE TABLE `tb_elc_forecast` (
  `id` int(10) NOT NULL AUTO_INCREMENT COMMENT id自增,
  `sku` varchar(50) NOT NULL COMMENT 商品,
  `customerid` varchar(50) DEFAULT NULL COMMENT 货主,
  `capital` float(10,2) DEFAULT 0.00 COMMENT 占用,
  `outbound` float(10,2) DEFAULT 0.00 COMMENT 出库,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

通过以上信息,我们不难发现,肇事SQL会先对主键聚集索引加记录排他锁,而不是间隙锁。理论上好像怎么都不会产生死锁。因为这里只有一个临界资源,那就是聚集索引。从理论上来说,要发生死锁,必须存在至少两个临界资源,两个独立事务,且每个事务执行过程中相间执行,互锁对方需要的临界资源,从而导致死锁。示意图如下:

技术图片

通过以上死锁理论理解分析,似乎光看这两个肇事SQL,不足以发生死锁的条件。那是为什么呢?我们不能孤立在这堆死锁日志里分析问题,应该去程序代码里找,找到肇事SQL所在的事务代码才行。这个死锁日志只是表达了发生死锁时涉事的两个SQL。并不一定能直接分析得出问题结论。

根据目前的情况,我做出如下假设:

1.事务A执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1131766 and capital >= 1.0 前,id=1131766的记录被事务B查询了,加了共享锁在可重复读事务隔离级别下。同样事务B执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1164727 and capital >= 1.0 前,id=1164727的记录被事务A查询了,加了共享锁在可重复读事务隔离级别下。最终两个事务的update语句无法获取id记录的排他锁导致死锁。

   那么事情果真如此吗?通过模拟测试,结果是否定的。

   事实上MySQL对于查询语句,只要查询完成就会释放共享锁,而不必等待事务结束,且和事务隔离级别无关。所以此种情况应该排除。

2.T1时刻事务A执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1131766 and capital >= 1.0 事务B执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1164727

and capital >= 1.0  

   T2时刻事务A执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1164727 and capital >= 1.0 事务B执行update tb_elc_forecast set capital = capital - 1.0, outbound = outbound + 1.0 where id = 1131766 

and capital >= 1.0。每个事务通过执行两个update操作,这样形成了死锁条件。

    那么事情果真如此吗?通过模拟测试,结果是肯定的。

    结合死锁图示,这里的临界资源1是id=1164727的记录锁,临界资源是id=1131766 的记录锁。update操作会占用各自id的记录锁资源。

 

通过一番查找,比较容易找到程序里对应的事务代码(使用了肇事SQL的业务逻辑代码),贴出部分如下:

for (ElcForecastTran entity : insertList) {
        ElcForecast elcForecast = new ElcForecast();
        String period = new SimpleDateFormat("yyyy-MM").format(request.getOrderTime());
        elcForecast.setCustomerid(entity.getCustomerid());
        elcForecast.setSku(entity.getSku());
        elcForecast.setPeriod(period);
        elcForecast.setTp(entity.getTp());
        elcForecast.setChannel(entity.getChannel());
        elcForecast.setDepth(2); // 子代理
        elcForecast.setIsActive(1); // 已发布
        Integer id = elcForecastMapper.selectId(elcForecast);
        if(id == null){
            isCapitalForecastFail = true;
            builder.append(entity.getSku()).append(",");
            logger.error("updateCapital fail,update forecast:{},request:{}",elcForecast,request);
            continue;
        }
        elcForecast.setId(id);
        elcForecast.setCapital(entity.getCapital()); // 占用额度
        cnt = elcForecastMapper.updateCapitalOutbound(elcForecast);
    }

 通过上下业务环境得知,此处事务方法,会存在同时被多线程调用问题,且代码:elcForecastMapper.updateCapitalOutbound(elcForecast);置于循环语句中,更新的对象elcForecast再不同线程中可能会存在重复的对象,故形成上述2死锁猜想条件。

    那么怎么解决呢?可能有多种方法,一种是将事务拆分成小事务,不要把整个循环置于一个事务中。二是对此处事务加分布式锁,保证一次只允许一个线程调用即可。具体方案视业务情况来定。

接下来我通过一个具体的死锁实验来讲述以上死锁发生的原理

首先打开两个MySQL客户端,并执行set SESSION autocommit=0;关闭事务自动提交功能

准备两条测试SQL:update tb_elc_forecast set capital = capital - 1.0 where id=5133;和update tb_elc_forecast set capital = capital - 1.0 where id=5137;

分别在两个客户端执行事务启动语句:START TRANSACTION;

客户端A执行:update tb_elc_forecast set capital = capital - 1.0 where id=5133;

客户端B执行:update tb_elc_forecast set capital = capital - 1.0 where id=5137;

客户端A执行:update tb_elc_forecast set capital = capital - 1.0 where id=5137;

客户端B执行:update tb_elc_forecast set capital = capital - 1.0 where id=5133;

此时你该看到客户端B死锁发生了,并牺牲了自己,让客户端A事务存活。

客户端A执行:commit;

客户端B执行:commit;

实验如图:

分别为客户端A和B截图

技术图片技术图片

客户端执行:SHOW ENGINE INNODB STATUS;

可以看到MySQL最近的死锁信息记录如下:

=====================================
2019-09-25 19:32:10 0x3a88 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 10 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 44 srv_active, 0 srv_shutdown, 4905 srv_idle
srv_master_thread log flush and writes: 4949
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 40
OS WAIT ARRAY INFO: signal count 41
RW-shared spins 0, rounds 78, OS waits 35
RW-excl spins 0, rounds 39, OS waits 1
RW-sx spins 2, rounds 60, OS waits 2
Spin rounds per wait: 78.00 RW-shared, 39.00 RW-excl, 30.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-09-25 19:31:55 0x52c4
*** (1) TRANSACTION:
TRANSACTION 304429, ACTIVE 37 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 13, OS thread handle 27608, query id 12406 localhost ::1 root updating
update tb_elc_forecast set capital = capital - 1.0 where id=5137
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 467 page no 6 n bits 120 index PRIMARY of table `test`.`tb_elc_forecast` trx id 304429 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 44; compact format; info bits 0
 0: len 4; hex 80001411; asc     ;;
 1: len 6; hex 00000004a52e; asc      .;;
 2: len 7; hex 22000002f72374; asc "    #t;;
 3: len 30; hex 39643231346264652d656532312d343034612d393862632d396261356232; asc 9d214bde-ee21-404a-98bc-9ba5b2; (total 36 bytes);
 4: len 10; hex 304d4d30303130303030; asc 0MM0010000;;
 5: len 7; hex 59534c44303031; asc YSLD001;;
 6: len 4; hex 0000a041; asc    A;;
 7: len 4; hex 000080bf; asc     ;;
 8: len 4; hex 000040c0; asc   @ ;;
 9: len 4; hex 00000000; asc     ;;
 10: len 4; hex 00000000; asc     ;;
 11: len 7; hex 323031392d3033; asc 2019-03;;
 12: len 9; hex 4252414e4453495445; asc BRANDSITE;;
 13: len 14; hex 776d735f656c635f62616f7a756e; asc wms_elc_baozun;;
 14: len 4; hex 80000002; asc     ;;
 15: len 4; hex 80000000; asc     ;;
 16: len 4; hex 80000000; asc     ;;
 17: len 5; hex 99a282e9bb; asc      ;;
 18: SQL NULL;
 19: len 6; hex 363739303539; asc 679059;;
 20: SQL NULL;
 21: len 4; hex 8000140d; asc     ;;
 22: SQL NULL;
 23: SQL NULL;
 24: SQL NULL;
 25: SQL NULL;
 26: SQL NULL;
 27: len 2; hex 3031; asc 01;;
 28: len 12; hex 4252414e44534954455f3031; asc BRANDSITE_01;;
 29: SQL NULL;
 30: len 30; hex 4c4f43204332204242205741544552e280bbe580a9e7a2a7e6b481e99da2; asc LOC C2 BB WATER               ; (total 45 bytes);
 31: len 9; hex 4252414e4453495445; asc BRANDSITE;;
 32: len 1; hex 59; asc Y;;
 33: len 1; hex 59; asc Y;;
 34: len 4; hex 80000000; asc     ;;
 35: SQL NULL;
 36: len 4; hex 00000000; asc     ;;
 37: len 4; hex 00000000; asc     ;;
 38: len 4; hex 80000000; asc     ;;
 39: len 4; hex 80000000; asc     ;;
 40: SQL NULL;
 41: SQL NULL;
 42: len 4; hex 00000000; asc     ;;
 43: len 4; hex 00000000; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 304430, ACTIVE 30 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 14, OS thread handle 21188, query id 12407 localhost ::1 root updating
update tb_elc_forecast set capital = capital - 1.0 where id=5133
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 467 page no 6 n bits 120 index PRIMARY of table `test`.`tb_elc_forecast` trx id 304430 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 44; compact format; info bits 0
 0: len 4; hex 80001411; asc     ;;
 1: len 6; hex 00000004a52e; asc      .;;
 2: len 7; hex 22000002f72374; asc "    #t;;
 3: len 30; hex 39643231346264652d656532312d343034612d393862632d396261356232; asc 9d214bde-ee21-404a-98bc-9ba5b2; (total 36 bytes);
 4: len 10; hex 304d4d30303130303030; asc 0MM0010000;;
 5: len 7; hex 59534c44303031; asc YSLD001;;
 6: len 4; hex 0000a041; asc    A;;
 7: len 4; hex 000080bf; asc     ;;
 8: len 4; hex 000040c0; asc   @ ;;
 9: len 4; hex 00000000; asc     ;;
 10: len 4; hex 00000000; asc     ;;
 11: len 7; hex 323031392d3033; asc 2019-03;;
 12: len 9; hex 4252414e4453495445; asc BRANDSITE;;
 13: len 14; hex 776d735f656c635f62616f7a756e; asc wms_elc_baozun;;
 14: len 4; hex 80000002; asc     ;;
 15: len 4; hex 80000000; asc     ;;
 16: len 4; hex 80000000; asc     ;;
 17: len 5; hex 99a282e9bb; asc      ;;
 18: SQL NULL;
 19: len 6; hex 363739303539; asc 679059;;
 20: SQL NULL;
 21: len 4; hex 8000140d; asc     ;;
 22: SQL NULL;
 23: SQL NULL;
 24: SQL NULL;
 25: SQL NULL;
 26: SQL NULL;
 27: len 2; hex 3031; asc 01;;
 28: len 12; hex 4252414e44534954455f3031; asc BRANDSITE_01;;
 29: SQL NULL;
 30: len 30; hex 4c4f43204332204242205741544552e280bbe580a9e7a2a7e6b481e99da2; asc LOC C2 BB WATER               ; (total 45 bytes);
 31: len 9; hex 4252414e4453495445; asc BRANDSITE;;
 32: len 1; hex 59; asc Y;;
 33: len 1; hex 59; asc Y;;
 34: len 4; hex 80000000; asc     ;;
 35: SQL NULL;
 36: len 4; hex 00000000; asc     ;;
 37: len 4; hex 00000000; asc     ;;
 38: len 4; hex 80000000; asc     ;;
 39: len 4; hex 80000000; asc     ;;
 40: SQL NULL;
 41: SQL NULL;
 42: len 4; hex 00000000; asc     ;;
 43: len 4; hex 00000000; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 467 page no 6 n bits 120 index PRIMARY of table `test`.`tb_elc_forecast` trx id 304430 lock_mode X locks rec but not gap waiting
Record lock, heap no 54 PHYSICAL RECORD: n_fields 44; compact format; info bits 0
 0: len 4; hex 8000140d; asc     ;;
 1: len 6; hex 00000004a52d; asc      -;;
 2: len 7; hex 21000002d522ea; asc !    " ;;
 3: len 30; hex 34326237346466332d383133372d346639372d393435382d613939613266; asc 42b74df3-8137-4f97-9458-a99a2f; (total 36 bytes);
 4: len 10; hex 304d4d30303130303030; asc 0MM0010000;;
 5: len 7; hex 59534c44303031; asc YSLD001;;
 6: len 4; hex 00004842; asc   HB;;
 7: len 4; hex 000080bf; asc     ;;
 8: len 4; hex 000040c0; asc   @ ;;
 9: SQL NULL;
 10: len 4; hex 00004842; asc   HB;;
 11: len 7; hex 323031392d3033; asc 2019-03;;
 12: SQL NULL;
 13: SQL NULL;
 14: len 4; hex 80000001; asc     ;;
 15: len 4; hex 80000002; asc     ;;
 16: len 4; hex 80000001; asc     ;;
 17: len 5; hex 99a282e9bb; asc      ;;
 18: len 5; hex 99a288b39e; asc      ;;
 19: len 6; hex 363739303539; asc 679059;;
 20: SQL NULL;
 21: SQL NULL;
 22: SQL NULL;
 23: SQL NULL;
 24: SQL NULL;
 25: SQL NULL;
 26: SQL NULL;
 27: len 2; hex 3031; asc 01;;
 28: len 12; hex 4252414e44534954455f3031; asc BRANDSITE_01;;
 29: SQL NULL;
 30: len 30; hex 4c4f43204332204242205741544552e280bbe580a9e7a2a7e6b481e99da2; asc LOC C2 BB WATER               ; (total 45 bytes);
 31: len 9; hex 4252414e4453495445; asc BRANDSITE;;
 32: len 1; hex 59; asc Y;;
 33: len 1; hex 59; asc Y;;
 34: len 4; hex 80000000; asc     ;;
 35: SQL NULL;
 36: len 4; hex 00006041; asc   `A;;
 37: len 4; hex 00000000; asc     ;;
 38: len 4; hex 80000002; asc     ;;
 39: len 4; hex 80000001; asc     ;;
 40: len 21; hex 4f4d53e5ba93e5ad98e69fa5e8afa2e68890e58a9f; asc OMS                  ;;
 41: len 30; hex 61303730393438612d633531642d346331362d616433392d393434393038; asc a070948a-c51d-4c16-ad39-944908; (total 36 bytes);
 42: len 4; hex 00000000; asc     ;;
 43: len 4; hex 00000000; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 304432
Purge done for trx‘s n:o < 304432 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 283332952971320, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952973936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952973064, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952972192, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952968704, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952967832, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 283332952966960, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 304429, ACTIVE 52 sec
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 13, OS thread handle 27608, query id 12406 localhost ::1 root
---TRANSACTION 304408, ACTIVE 4442 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 9, OS thread handle 14984, query id 12411 localhost 127.0.0.1 root starting
show ENGINE INNODB STATUS
---TRANSACTION 304407, ACTIVE 4450 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 8, OS thread handle 25856, query id 12278 localhost 127.0.0.1 root
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o‘s:, sync i/o‘s:
Pending flushes (fsync) log: 0; buffer pool: 0
609 OS file reads, 1233 OS file writes, 226 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 207, seg size 209, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 1 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 412469041
Log flushed up to   412469041
Pages flushed up to 412469041
Last checkpoint at  412469032
0 pending log flushes, 0 pending chkp writes
118 log i/o‘s done, 0.00 log i/o‘s/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 8585216
Dictionary memory allocated 128725
Buffer pool size   512
Free buffers       255
Database pages     256
Old database pages 0
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 580, created 672, written 1023
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 256, unzip_LRU len: 0
I/O sum[10]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=27356, Main thread ID=21352, state: sleeping
Number of rows inserted 13258, updated 11, deleted 0, read 108651
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

 

关于死锁测试,最好使用MySQL自带的Console客户端实验,不要使用第三方可视化工具,效果不太好。

MySql一个生产死锁案例分析

标签:来讲   操作   png   pac   incr   ogg   fsync   continue   sequence   

人气教程排行