Bug #69406
closedbluestore thrashers/simple: hangs with "waiting on" (due to can't lock)
0%
Description
https://pulpito.ceph.com/teuthology-2024-12-28_20:56:02-crimson-rados-main-distro-crimson-smithi/8055126
https://pulpito.ceph.com/matan-2024-12-24_17:01:33-crimson-rados-wip-matanb-crimson-only-testing-24-dec-distro-crimson-smithi/8052145
https://pulpito.ceph.com/matan-2024-12-24_17:01:33-crimson-rados-wip-matanb-crimson-only-testing-24-dec-distro-crimson-smithi/8052105
While there is not obvious failure, the thrash test seems to hand until the job dies.
Looking at the logs(job 8055126), the last thrash operations are:
2024-12-29T03:08:02.013 INFO:tasks.rados.rados.0.smithi043.stdout:2219: writing smithi04333938-29 from 568334 to 1218350 tid 1 2024-12-29T03:08:02.019 INFO:tasks.rados.rados.0.smithi043.stdout:2219: writing smithi04333938-29 from 1860854 to 2168334 tid 2 2024-12-29T03:08:02.019 INFO:tasks.rados.rados.0.smithi043.stdout:2220: snap_create 2024-12-29T03:08:02.019 INFO:tasks.rados.rados.0.smithi043.stdout: waiting on 2 2024-12-29T03:08:02.066 INFO:tasks.rados.rados.0.smithi043.stdout:2219: finishing write tid 1 to smithi04333938-29 2024-12-29T03:08:02.079 INFO:tasks.rados.rados.0.smithi043.stdout:2219: finishing write tid 2 to smithi04333938-29 2024-12-29T03:08:02.084 INFO:tasks.rados.rados.0.smithi043.stdout:2219: finishing write tid 3 to smithi04333938-29 2024-12-29T03:08:02.087 INFO:tasks.rados.rados.0.smithi043.stdout:2219: finishing write tid 4 to smithi04333938-29 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:2219: oid 29 updating version 0 to 670 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:2219: oid 29 version 670 is already newer than 670 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:2219: oid 29 version 670 is already newer than 670 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:update_object_version oid 29 v 670 (ObjNum 962 snap 303 seq_num 962) dirty exists 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:2219: left oid 29 (ObjNum 962 snap 303 seq_num 962) 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout:2219: done (1 left) 2024-12-29T03:08:02.088 INFO:tasks.rados.rados.0.smithi043.stdout: waiting on 1
Tracking smithi04333938-29 from osd.2, we seem to hit a loop where the "RETRY" count of the MOSDOp keeps increasing:
DEBUG 2024-12-29 03:37:54,835 [shard 1:main] osd - pg_epoch 769 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=33564866, detail=m=[osd_op(client.4248.0:5874 3.4 3:22ce8713:::smithi04333938-38:head [read 0~0,checksum in=4b] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected+supports_pool_eio e769)]) start DEBUG 2024-12-29 03:37:54,835 [shard 1:main] osd - pg_epoch 769 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=33564867, detail=m=[osd_op(client.4248.0:5875 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected+supports_pool_eio e769)]) start DEBUG 2024-12-29 03:52:54,844 [shard 1:main] osd - pg_epoch 821 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=13246, detail=m=[osd_op(client.4248.0:5873 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0,omap-get-vals-by-keys in=4b,omap-get-keys in=12b,omap-get-vals in=16b,omap-get-header,getxattrs] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected+supports_pool_eio e821)]) start DEBUG 2024-12-29 03:52:54,844 [shard 1:main] osd - pg_epoch 821 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=13247, detail=m=[osd_op(client.4248.0:5874 3.4 3:22ce8713:::smithi04333938-38:head [read 0~0,checksum in=4b] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected+supports_pool_eio e821)]) start DEBUG 2024-12-29 03:52:54,845 [shard 1:main] osd - pg_epoch 821 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=13248, detail=m=[osd_op(client.4248.0:5875 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0] snapc 0=[] RETRY=2 ondisk+retry+read+known_if_redirected+supports_pool_eio e821)]) start DEBUG 2024-12-29 04:07:54,852 [shard 1:main] osd - pg_epoch 861 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16793317, detail=m=[osd_op(client.4248.0:5873 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0,omap-get-vals-by-keys in=4b,omap-get-keys in=12b,omap-get-vals in=16b,omap-get-header,getxattrs] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected+supports_pool_eio e861)]) start DEBUG 2024-12-29 04:07:54,853 [shard 1:main] osd - pg_epoch 861 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16793318, detail=m=[osd_op(client.4248.0:5874 3.4 3:22ce8713:::smithi04333938-38:head [read 0~0,checksum in=4b] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected+supports_pool_eio e861)]) start DEBUG 2024-12-29 04:07:54,855 [shard 1:main] osd - pg_epoch 861 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16793319, detail=m=[osd_op(client.4248.0:5875 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0] snapc 0=[] RETRY=3 ondisk+retry+read+known_if_redirected+supports_pool_eio e861)]) start DEBUG 2024-12-29 04:22:54,861 [shard 1:main] osd - pg_epoch 925 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16795000, detail=m=[osd_op(client.4248.0:5873 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0,omap-get-vals-by-keys in=4b,omap-get-keys in=12b,omap-get-vals in=16b,omap-get-header,getxattrs] snapc 0=[] RETRY=4 ondisk+retry+read+known_if_redirected+supports_pool_eio e925)]) start DEBUG 2024-12-29 04:22:54,862 [shard 1:main] osd - pg_epoch 925 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16795001, detail=m=[osd_op(client.4248.0:5874 3.4 3:22ce8713:::smithi04333938-38:head [read 0~0,checksum in=4b] snapc 0=[] RETRY=4 ondisk+retry+read+known_if_redirected+supports_pool_eio e925)]) start DEBUG 2024-12-29 04:22:54,863 [shard 1:main] osd - pg_epoch 925 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=16795002, detail=m=[osd_op(client.4248.0:5875 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0] snapc 0=[] RETRY=4 ondisk+retry+read+known_if_redirected+supports_pool_eio e925)]) start DEBUG 2024-12-29 04:37:54,870 [shard 1:main] osd - pg_epoch 953 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=17890, detail=m=[osd_op(client.4248.0:5873 3.4 3:22ce8713:::smithi04333938-38:head [sparse-read 0~0,omap-get-vals-by-keys in=4b,omap-get-keys in=12b,omap-get-vals in=16b,omap-get-header,getxattrs] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected+supports_pool_eio e953)]) start DEBUG 2024-12-29 04:37:54,870 [shard 1:main] osd - pg_epoch 953 pg[3.4( v 663'374 (0'0,663'374] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,0] r=0 lpr=16 lua=0'0 crt=663'374 mlcod 663'374 active+clean ClientRequest::with_pg_process: client_request(id=17891, detail=m=[osd_op(client.4248.0:5874 3.4 3:22ce8713:::smithi04333938-38:head [read 0~0,checksum in=4b] snapc 0=[] RETRY=5 ondisk+retry+read+known_if_redirected+supports_pool_eio e953)]) start
smithi04333938-38 was unable to lock_for_read and was added to waiters and never got the lock.
DEBUG 2024-12-29 03:08:02,038 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): DEBUG 2024-12-29 03:08:02,039 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_read(): DEBUG 2024-12-29 03:08:02,039 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): can't lock_for_read, adding to waiters
Updated by Matan Breizman about 1 year ago
- Description updated (diff)
- Priority changed from Normal to High
Updated by Matan Breizman about 1 year ago
Similar story in - https://pulpito.ceph.com/matan-2024-12-24_17:01:33-crimson-rados-wip-matanb-crimson-only-testing-24-dec-distro-crimson-smithi/8052145/
osd.3:
DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_excl(): DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_excl(): DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_excl(): can't lock_for_excl, adding to waiters
And MOSDOp is retried endlessly. see RETRY=27 :
DEBUG 2024-12-25 01:40:35,518 [shard 2:main] osd - pg_epoch 2736 pg[3.c( v 1158'1498 (0'0,1158'1498] local-lis/les=16/17 n=2 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [3,0,1] r=0 lpr=16 lua=0'0 crt=1158'1497 mlcod 1158'1498 active+clean ClientRequest::with_pg_process: client_request(id=166548, detail=m=[osd_op(client.4237.0:11039 3.c 3:37fcda0a:::smithi07333615-13:head [read 0~1] snapc 0=[] RETRY=27 ondisk+retry+read+rwordered+known_if_redirected+supports_pool_eio e2736)]) start
Updated by Matan Breizman about 1 year ago
From case 1:
DEBUG 2024-12-29 02:47:43,884 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 1 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): lock_for_read successfully DEBUG 2024-12-29 02:47:43,884 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 1 exclusively_used false waiters: 0 tri_mutex::unlock_for_read(): DEBUG 2024-12-29 02:47:43,884 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::wake(): DEBUG 2024-12-29 02:47:43,884 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::wake(): no waiters DEBUG 2024-12-29 02:47:48,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-29 02:47:48,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-29 02:47:48,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-29 02:47:48,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-29 02:47:48,805 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-29 02:47:48,805 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::is_acquired(): DEBUG 2024-12-29 02:47:48,912 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::unlock_for_write(): DEBUG 2024-12-29 02:47:48,912 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::wake(): DEBUG 2024-12-29 02:47:48,912 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::wake(): no waiters DEBUG 2024-12-29 02:47:53,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-29 02:47:53,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-29 02:47:53,804 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): lock_for_write successfully DEBUG 2024-12-29 02:47:53,805 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-29 02:47:53,805 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::is_acquired(): DEBUG 2024-12-29 03:08:02,038 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): DEBUG 2024-12-29 03:08:02,039 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_read(): DEBUG 2024-12-29 03:08:02,039 [shard 1:main] osd - tri_mutex 0000000000000003.44371E8C.head.smithi04333938-38.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): can't lock_for_read, adding to waiters
Case 2:
DEBUG 2024-12-24 18:40:48,787 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 2 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-24 18:40:48,787 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 2 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-24 18:40:48,787 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 3 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): lock_for_write successfully DEBUG 2024-12-24 18:40:48,788 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-24 18:40:48,788 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-24 18:40:48,788 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::is_acquired(): DEBUG 2024-12-24 18:40:48,800 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 3 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-24 18:40:48,800 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 3 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-24 18:40:48,800 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 4 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): lock_for_write successfully DEBUG 2024-12-24 18:40:48,801 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-24 18:40:48,801 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::is_acquired(): DEBUG 2024-12-24 18:40:48,812 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 4 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): DEBUG 2024-12-24 18:40:48,812 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 4 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-24 18:40:48,812 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 4 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_write(): DEBUG 2024-12-24 18:40:48,812 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_write(): lock_for_write successfully DEBUG 2024-12-24 18:40:48,813 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::~tri_mutex(): DEBUG 2024-12-24 18:40:48,813 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 0 readers 0 exclusively_used false waiters: 0 tri_mutex::is_acquired(): DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_excl(): DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_excl(): DEBUG 2024-12-24 18:40:48,827 [shard 2:main] osd - tri_mutex 0000000000000003.CEF3B505.head.smithi07333615-13.. writers 5 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_excl(): can't lock_for_excl, adding to waiters
The tri_mutex seems to destruct, but the counters are not updated?
Updated by Matan Breizman about 1 year ago
- Status changed from New to In Progress
- Assignee set to Matan Breizman
Updated by Matan Breizman about 1 year ago
Matan Breizman wrote in #note-4:
From case 1:
[...]Case 2:
[...]The tri_mutex seems to destruct, but the counters are not updated?
This should explain this behavior: https://github.com/ceph/ceph/pull/61253
Updated by Matan Breizman about 1 year ago
DEBUG 2025-01-06 14:40:09,528 [shard 1:main] osd - tri_mutex 0000000000000003.A3A726F3.head.smithi12233187-28.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_read(): DEBUG 2025-01-06 14:40:09,528 [shard 1:main] osd - tri_mutex 0000000000000003.A3A726F3.head.smithi12233187-28.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::try_lock_for_read(): DEBUG 2025-01-06 14:40:09,528 [shard 1:main] osd - tri_mutex 0000000000000003.A3A726F3.head.smithi12233187-28.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): can't lock_for_read, adding to waiters DEBUG 2025-01-06 14:40:09,528 [shard 1:main] osd - tri_mutex 0000000000000003.A3A726F3.head.smithi12233187-28.. writers 1 readers 0 exclusively_used false waiters: 0 tri_mutex::lock_for_read(): can't lock_for_read, adding to waiters DEBUG 2025-01-06 17:40:05,150 [shard 1:main] osd - pg_epoch 1762 pg[3.a( v 163'38 (52'29,163'38] local-lis/les=16/17 n=2 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,1,2] r=0 lpr=16 lua=0'0 crt=163'38 mlcod 163'38 active+clean ClientRequest::with_pg_process: client_request(id=47193, detail=m=[osd_op(client.4245.0:687 3.a 3:5c5e46fc:::smithi12233187-28:head [read 0~0,checksum in=4b,omap-get-vals-by-keys in=4b,omap-get-keys in=12b,omap-get-vals in=16b,omap-get-header,getxattrs] snapc 0=[] RETRY=10 ondisk+retry+read+balance_reads+known_if_redirected+supports_pool_eio e1762)]) start
Updated by Matan Breizman about 1 year ago
- Subject changed from bluestore thrashers/simple: pool-snaps-few-objects-* hangs to bluestore thrashers/simple: *-snaps-few-objects-* hangs (due to can't lock)
Updated by Matan Breizman about 1 year ago · Edited
2025-01-08T03:25:22.628 INFO:tasks.rados.rados.0.smithi110.stdout:3997: write (excl) oid 36 current snap is 549
2025-01-08T03:25:22.629 INFO:tasks.rados.rados.0.smithi110.stdout:3997: seq_num 1757 ranges {729899=690036,1998627=471271,3166316=363583}
2025-01-08T03:25:22.636 INFO:tasks.rados.rados.0.smithi110.stdout:3997: writing smithi11034285-36 from 729899 to 1419935 tid 1
2025-01-08T03:25:22.642 INFO:tasks.rados.rados.0.smithi110.stdout:3997: writing smithi11034285-36 from 1998627 to 2469898 tid 2
2025-01-08T03:25:22.649 INFO:tasks.rados.rados.0.smithi110.stdout:3997: writing smithi11034285-36 from 3166316 to 3529899 tid 3
2025-01-08T03:25:22.688 INFO:tasks.rados.rados.0.smithi110.stdout:3997: finishing write tid 1 to smithi11034285-36
2025-01-08T03:25:22.966 INFO:tasks.rados.rados.0.smithi110.stdout:3997: finishing write tid 2 to smithi11034285-36
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: finishing write tid 3 to smithi11034285-36
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: finishing write tid 4 to smithi11034285-36
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: finishing write tid 5 to smithi11034285-36
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: oid 36 updating version 0 to 409
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: oid 36 version 409 is already newer than 406
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: oid 36 version 409 is already newer than 409
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: oid 36 version 409 is already newer than 409
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: left oid 36 (ObjNum 1757 snap 549 seq_num 1757)
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3997: done (5 left)
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3998: done (4 left)
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout:3999: done (3 left)
2025-01-08T03:25:22.967 INFO:tasks.rados.rados.0.smithi110.stdout: waiting on 3
osd2:
One of the objects that has large RETRY count - smithi11034285-18:head:
DEBUG 2025-01-08 05:21:49,215 [shard 2:main] osd - pg_epoch 1616 pg[3.6( v 941'723 (0'0,941'723] local-lis/les=16/17 n=1 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [2,3,1] r=0 lpr=16 lua=0'0 crt=941'723 mlcod 941'723 active+clean ClientRequest::with_pg_process: client_request(id=33567273, detail=m=[osd_op(client.4248.0:9004 3.6 3:6dc439a8:::smithi11034285-18:head [write 699554~762089 in=762089b,stat] snapc 1b4=[1b4,1b3,1b2,1b1,1b0] RETRY=7 ondisk+retry+write+known_if_redirected+supports_pool_eio e1616)]) start
Never received lock:
DEBUG 2025-01-08 03:21:49,266 [shard 2:main] osd - tri_mutex 0000000000000003.6B32C951.head.smithi11034285-18.. writers 0 readers 0 exclusively_used true waiters: 0 tri_mutex::lock_for_excl(): DEBUG 2025-01-08 03:21:49,266 [shard 2:main] osd - tri_mutex 0000000000000003.6B32C951.head.smithi11034285-18.. writers 0 readers 0 exclusively_used true waiters: 0 tri_mutex::try_lock_for_excl(): DEBUG 2025-01-08 03:21:49,266 [shard 2:main] osd - tri_mutex 0000000000000003.6B32C951.head.smithi11034285-18.. writers 0 readers 0 exclusively_used true waiters: 0 tri_mutex::lock_for_excl(): can't lock_for_excl, adding to waiters
Updated by Matan Breizman about 1 year ago
- Related to Bug #69460: bluestore/thrash_simple: snaps-few-objects hangs after excl write added
Updated by Matan Breizman about 1 year ago · Edited
grep -E '3811: ' | tail -n 30
2025-01-08T16:53:11.450 INFO:tasks.rados.rados.0.smithi079.stdout:3811: write (excl) oid 30 current snap is 546
2025-01-08T16:53:11.450 INFO:tasks.rados.rados.0.smithi079.stdout:3811: seq_num 1650 ranges {531550=609451,1937289=598568,2944644=602233}
2025-01-08T16:53:11.455 INFO:tasks.rados.rados.0.smithi079.stdout:3811: writing smithi07934291-30 from 531550 to 1141001 tid 1
2025-01-08T16:53:11.465 INFO:tasks.rados.rados.0.smithi079.stdout:3811: writing smithi07934291-30 from 1937289 to 2535857 tid 2
2025-01-08T16:53:11.471 INFO:tasks.rados.rados.0.smithi079.stdout:3811: writing smithi07934291-30 from 2944644 to 3546877 tid 3
2025-01-08T16:53:11.687 INFO:tasks.rados.rados.0.smithi079.stdout:3811: finishing write tid 1 to smithi07934291-30
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: finishing write tid 2 to smithi07934291-30
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: finishing write tid 3 to smithi07934291-30
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: finishing write tid 4 to smithi07934291-30
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: finishing write tid 5 to smithi07934291-30
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: oid 30 updating version 0 to 2431
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: oid 30 updating version 2431 to 2434
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: oid 30 version 2434 is already newer than 2434
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: oid 30 version 2434 is already newer than 2434
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: left oid 30 (ObjNum 1650 snap 546 seq_num 1650)
2025-01-08T16:53:11.688 INFO:tasks.rados.rados.0.smithi079.stdout:3811: done (2 left)
Updated by Matan Breizman about 1 year ago
2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: finishing write tid 3 to smithi07234081-15 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3988: done (7 left) 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout: waiting on 7 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: finishing write tid 4 to smithi07234081-15 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: finishing write tid 5 to smithi07234081-15 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: oid 15 updating version 0 to 1727 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: oid 15 version 1727 is already newer than 1724 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: oid 15 version 1727 is already newer than 1727 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: oid 15 version 1727 is already newer than 1727 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:update_object_version oid 15 v 1727 (ObjNum 1767 snap 560 seq_num 1767) dirty exists 2025-01-17T10:00:19.206 INFO:tasks.rados.rados.0.smithi072.stdout:3993: left oid 15 (ObjNum 1767 snap 560 seq_num 1767) 2025-01-17T10:00:19.207 INFO:tasks.rados.rados.0.smithi072.stdout:3993: done (6 left) 2025-01-17T10:00:19.207 INFO:tasks.rados.rados.0.smithi072.stdout: waiting on 6 2025-01-17T10:00:19.207 INFO:tasks.rados.rados.0.smithi072.stdout:3999: expect (ObjNum 1717 snap 535 seq_num 1717) 2025-01-17T10:00:19.302 INFO:tasks.rados.rados.0.smithi072.stdout:seeking to 0 2025-01-17T10:00:19.486 INFO:tasks.rados.rados.0.smithi072.stdout:seeking to 2491972 2025-01-17T10:00:19.495 INFO:tasks.rados.rados.0.smithi072.stdout:3998: finishing write tid 1 to smithi07234081-5 2025-01-17T10:00:19.495 INFO:tasks.rados.rados.0.smithi072.stdout:3998: finishing write tid 2 to smithi07234081-5 2025-01-17T10:00:19.495 INFO:tasks.rados.rados.0.smithi072.stdout:3998: finishing write tid 3 to smithi07234081-5 2025-01-17T10:00:19.495 INFO:tasks.rados.rados.0.smithi072.stdout:3998: finishing write tid 4 to smithi07234081-5 2025-01-17T10:00:19.495 INFO:tasks.rados.rados.0.smithi072.stdout:3998: finishing write tid 5 to smithi07234081-5 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: oid 5 updating version 0 to 770 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: oid 5 version 770 is already newer than 770 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: oid 5 version 770 is already newer than 767 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: oid 5 version 770 is already newer than 770 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:update_object_version oid 5 v 770 (ObjNum 1769 snap 560 seq_num 1769) dirty exists 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: left oid 5 (ObjNum 1769 snap 560 seq_num 1769) 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3998: done (5 left) 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout:3999: done (4 left) 2025-01-17T10:00:19.496 INFO:tasks.rados.rados.0.smithi072.stdout: waiting on 4
Updated by Matan Breizman about 1 year ago
- Subject changed from bluestore thrashers/simple: *-snaps-few-objects-* hangs (due to can't lock) to bluestore thrashers/simple: hangs with "waiting on" (due to can't lock)
Updated by Matan Breizman about 1 year ago
2025-02-19T18:48:20.894 INFO:tasks.rados.rados.0.smithi096.stdout:update_object_version oid 21 v 180 (ObjNum 360 snap 92 seq_num 360) dirty exists 2025-02-19T18:48:20.894 INFO:tasks.rados.rados.0.smithi096.stdout:742: left oid 21 (ObjNum 360 snap 92 seq_num 360) 2025-02-19T18:48:20.954 INFO:tasks.rados.rados.0.smithi096.stdout:736: done (5 left) 2025-02-19T18:48:20.955 INFO:tasks.rados.rados.0.smithi096.stdout:738: done (4 left) 2025-02-19T18:48:20.955 INFO:tasks.rados.rados.0.smithi096.stdout:739: done (3 left) 2025-02-19T18:48:20.955 INFO:tasks.rados.rados.0.smithi096.stdout:742: done (2 left) 2025-02-19T18:48:20.955 INFO:tasks.rados.rados.0.smithi096.stdout:744: done (1 left) 2025-02-19T18:48:20.955 INFO:tasks.rados.rados.0.smithi096.stdout:745: snap_remove snap 89 2025-02-19T18:48:21.894 INFO:tasks.rados.rados.0.smithi096.stdout:745: done (1 left) 2025-02-19T18:48:21.895 INFO:tasks.rados.rados.0.smithi096.stdout:746: delete oid 12 current snap is 92 2025-02-19T18:48:21.969 INFO:tasks.rados.rados.0.smithi096.stdout:746: done (1 left) 2025-02-19T18:48:21.970 INFO:tasks.rados.rados.0.smithi096.stdout:747: snap_create 2025-02-19T18:48:21.970 INFO:tasks.rados.rados.0.smithi096.stdout: waiting on 1
Updated by Matan Breizman about 1 year ago
Updated by Josh Durgin about 1 year ago
Looking at the logs from osd.1 in matan-2025-03-09_12:30:44-crimson-rados-wip-matanb-crimson-only-testing-6-mar-distro-crimson-smithi/8176595, it appears one of the stuck ops may be due to a ms bug - this reply is sent to the messenger but never makes it onto the wire:
DEBUG 2025-03-09 13:47:16,584 [shard 1:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] send 29 send() to core 2 -- osd_op_reply(279 smithi01736657-10 [read 0~1 out=1b] v23'20 uv20 ondisk = 0) v8 DEBUG 2025-03-09 13:47:16,585 [shard 2:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] got 29 do_send() from core 1 -- osd_op_reply(279 smithi01736657-10 [read 0~1 out=1b] v23'20 uv20 ondisk = 0) v8
I'm not positive this is the problem because there aren't client side logs to corroborate, and the osd logs have some overlapping lines. Will look at other instances and update (this run had 5 ops getting stuck).
Updated by Yingxin Cheng about 1 year ago
Looking at the logs from osd.1 in matan-2025-03-09_12:30:44-crimson-rados-wip-matanb-crimson-only-testing-6-mar-distro-crimson-smithi/8176595,
it appears one of the stuck ops may be due to a ms bug - this reply is sent to the messenger but never makes it onto the wire:
The cross-core id (from the shard 1 to 2) kept increasing (29,30...) in the followup logs and the connection doesn't break atm:
... 79986:DEBUG 2025-03-09 13:47:16,584 [shard 1:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] send 29 send() to core 2 -- osd_op_reply(279 smithi01736657-10 [read 0~1 out=1b] v23'20 uv20 ondisk = 0) v8 79992:DEBUG 2025-03-09 13:47:16,585 [shard 2:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] got 29 do_send() from core 1 -- osd_op_reply(279 smithi01736657-10 [read 0~1 out=1b] v23'20 uv20 ondisk = 0) v8 ... 95764:DEBUG 2025-03-09 13:47:18,746 [shard 1:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] send 30 send() to core 2 -- osd_op_reply(315 smithi01736657-50 [write 742876~447756] v25'27 uv27 ondisk = 0) v8 95805:DEBUG 2025-03-09 13:47:18,747 [shard 2:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] got 30 do_send() from core 1 -- osd_op_reply(315 smithi01736657-50 [write 742876~447756] v25'27 uv27 ondisk = 0) v8 ... 96310:DEBUG 2025-03-09 13:47:18,760 [shard 1:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] send 31 send() to core 2 -- osd_op_reply(316 smithi01736657-50 [write 1729599~575687] v25'27 uv27 ondisk = 0) v8 ... 96358:DEBUG 2025-03-09 13:47:18,762 [shard 2:main] ms - [0x5110001d6b80 osd.1(client) v2:172.21.15.17:6804/4139576993 >> client.4244 172.21.15.17:0/2218821415] got 31 do_send() from core 1 -- osd_op_reply(316 smithi01736657-50 [write 1729599~575687] v25'27 uv27 ondisk = 0) v8 ...
The logs by themselves looked normal to me.
Updated by Josh Durgin about 1 year ago
Reproduced with client side logging to make lost lines less of an issue:
In this case it looks to be related to not waking waiters or finishing recovery correctly - all the unanswered ops are hung waiting for recovery for their objects to be readable on the primary.
For one object, -917, there is an exception in what I believe is loading the obc during handle_pull_response():
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ReplicatedRecoveryBackend::handle_pull: ObjectRecoveryInfo(3:0b68fe79:::smithi09435673-917:56@0'0, size: 3395212, copy_subset: [(0, 3395212)], clone_subset: {}, snapset: 56=[]:{56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]}, object_exist: false) ObjectRecoveryProgress(!first, data_recovered_to: 2987076, data_complete: false, omap_recovered_to: , omap_complete: true, error: false) data.size() is 1048576 data_included: [(595212, 490444), (1868800, 458988), (2887932, 99144)]
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock: 3:0b68fe79:::smithi09435673-917:56 0
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: 3:0b68fe79:::smithi09435673-917:56 0
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 0 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::lock_for_read():
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 0 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::try_lock_for_read():
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 1 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::lock_for_read(): lock_for_read successfully
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - resolve_oid oid.snap=56,head snapset.seq=56
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - resolve_oid {56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]} does not contain 56 -- DNE
ERROR 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0
'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: clone 3:0b68fe79:::smithi09435673-917:56 not found
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] ms - [0x51100044a240 osd.3(cluster) v2:172.21.15.148:6810/4122060133@56070 >> osd.1 v2:172.21.15.94:6801/2595703340] GOT AckFrame: seq=63
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 1 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::unlock_for_read():
DEBUG 2025-03-12 22:55:50,786 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 0 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::wake():
DEBUG 2025-03-12 22:55:50,786 [shard 0:main] osd - tri_mutex 0000000000000003.0D61F7E9.head.smithi09435673-917.. writers 0 readers 0 exclusively_used false waiters: 0 address 0x5190000da2c8 tri_mutex::wake(): no waiters
DEBUG 2025-03-12 22:55:50,786 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0
'0 active+recovering+degraded ObjectContextLoader::release_state: releasing obc ObjectContext(0x5190000da280, oid=3:0b68fe79:::smithi09435673-917:head, refcount=2), exists true oi 3:0b68fe79:::smithi09435673-917:head(257'370 client.4234.0:6024 s
3143206 uv 370 alloc_hint [0 0 0])
DEBUG 2025-03-12 22:55:50,786 [shard 0:main] osd - background_recovery_sub(id=226, detail=MOSDPGPush(3.0 294/293 [PushOp(3:0b68fe79:::smithi09435673-917:56, version: 257'367, data_included: [595212~490444,1868800~458988,2887932~99144], data_size: 1
048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:0b68fe79:::smithi09435673-917:56@0'0, size: 3395212, copy_subset: [(0, 3395212)], clone_subset: {}, snapset: 56=[]:{56: [55, 53, 52, 51, 50, 4f,
4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2987076, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: Obje
ctRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))])): exit
DEBUG 2025-03-12 22:55:50,786 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-3) queue_transactions ch 0x5170000c3600 3.8_head
WARN 2025-03-12 22:55:50,786 [shard 0:main] seastar - Exceptional future ignored: crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<2> >::throwable_carrier, backtrace: /lib64/libasan.so.8+0x82cbf 0xc6f5312 0xc6f5af0 0xc6f6be1 0xc6f75c6 0xbbeb7bb 0xbbebb38 0x36c406f 0x37de436 0x386cff3 0x38a45fd 0xbdee883 0xbdf9bfb 0xbeb3053 0xbeb4529 0xbb6c34f 0xbb6cb87 0x38b3bec /lib64/libc.so.6+0x295cf /lib64/libc.so.6+0x2967f 0x36979f4
For object smithi09435673-432, the ops get stuck at waiting for
active, despite the pg already being active, making client ops 6371-6375 hang:
DEBUG 2025-03-12 23:25:51,166 [shard 0:main] osd - pg_epoch 318 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=311/312 n=74 ec=18/18 lis/c=311/212 les/c/f=312/213/0 sis=311) [3,2,1] r=0 lpr=311 pi=[212,311)/2 crt=280'385 lcod 0'0 mlcod 0 '0 active+recovering+degraded ClientRequest::with_pg_process: client_request(id=16777675, detail=m=[osd_op(client.4234.0:6374 3.0 3:0d132c38:::smithi09435673-432:head [setxattr _header (59) in=66b,truncate 3916328] snapc 69=[68,67,66,65,64,63,62,6 1,60,5e,5d,5c,58,57,55,4e,4a,39] RETRY=1 ondisk+retry+write+known_if_redirected+supports_pool_eio e318)]).0: waited for epoch 318, waiting for active
Will continue digging to see why exactly this is occurring.
Updated by Matan Breizman about 1 year ago · Edited
Josh Durgin wrote in #note-22:
Reproduced with client side logging to make lost lines less of an issue:
In this case it looks to be related to not waking waiters or finishing recovery correctly - all the unanswered ops are hung waiting for recovery for their objects to be readable on the primary.
For one object, -917, there is an exception in what I believe is loading the obc during handle_pull_response():
[...]
For object smithi09435673-432, the ops get stuck at waiting for
active, despite the pg already being active, making client ops 6371-6375 hang:[...]
Will continue digging to see why exactly this is occurring.
By reading your last comment few things come to mind:
1) The " Exceptional future ignored " looks worrying to me, by guessing I would say that we are not handling an exception that we threw.
Two possible ReplicatedRecoveryBackend suspects are:
WARNDPP("unable to decode SnapSet", pg);
throw crimson::osd::invalid_argument();
and, possibly more related, from "handle_pull_response":
if (push_op.version == eversion_t()) {
// replica doesn't have it!
pg.get_recovery_handler()->on_failed_recover({ m->from }, push_op.soid,
get_recovering(push_op.soid).pull_info->recovery_info.version);
throw std::runtime_error(
fmt::format(
"Error on pushing side {} when pulling obj {}",
m->from, push_op.soid));
}
2) Probably less related but worth mentioning, Sam has recently pushed a PR in which we take the lock on the replica side in the "final" push - to prevent reads on uncommitted objects.
I do not see a direct relation here yet but will have to take a better look, attaching the PR https://github.com/ceph/ceph/pull/62288.
edit: I also found a possble undefined behavior in handle_pull_response which is fixed here (commented in 62288)
https://github.com/ceph/ceph-ci/commit/263737bc4cef4795a88d0b3464a9c98821f886a2
Updated by Josh Durgin about 1 year ago
Matan Breizman wrote in #note-23:
Josh Durgin wrote in #note-22:
Reproduced with client side logging to make lost lines less of an issue:
In this case it looks to be related to not waking waiters or finishing recovery correctly - all the unanswered ops are hung waiting for recovery for their objects to be readable on the primary.
For one object, -917, there is an exception in what I believe is loading the obc during handle_pull_response():
[...]
For object smithi09435673-432, the ops get stuck at waiting for
active, despite the pg already being active, making client ops 6371-6375 hang:[...]
Will continue digging to see why exactly this is occurring.
By reading your last comment few things come to mind:
1) The " Exceptional future ignored " looks worrying to me, by guessing I would say that we are not handling an exception that we threw.
Two possible ReplicatedRecoveryBackend suspects are:
[...]and, possibly more related, from "handle_pull_response":
[...]
It looks to me like the error is enoent coming from obc_loader.load_and_lock()->load_and_lock_clone(), since this is where the 'ERROR' log line comes from.
But this should be handled by this assert:
I'm not clear why this is not handled - can this be intercepted?
Updated by Matan Breizman about 1 year ago
Sanitized logs:
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ReplicatedRecoveryBackend::handle_pull_response: MOSDPGPush(3.0 294/293 [PushOp(3:0b68fe79:::smithi09435673-917:56, version: 257'367, data_included: [595212~490444,1868800~458988,2887932~99144], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:0b68fe79:::smithi09435673-917:56@0'0, size: 3395212, copy_subset: [(0, 3395212)], clone_subset: {}, snapset: 56=[]:{56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2987076, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))]) v4
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ReplicatedRecoveryBackend::handle_pull: ObjectRecoveryInfo(3:0b68fe79:::smithi09435673-917:56@0'0, size: 3395212, copy_subset: [(0, 3395212)], clone_subset: {}, snapset: 56=[]:{56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]}, object_exist: false) ObjectRecoveryProgress(!first, data_recovered_to: 2987076, data_complete: false, omap_recovered_to: , omap_complete: true, error: false) data.size() is 1048576 data_included: [(595212, 490444), (1868800, 458988), (2887932, 99144)]
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: 3:0b68fe79:::smithi09435673-917:56 0
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - resolve_oid oid.snap=56,head snapset.seq=56
DEBUG 2025-03-12 22:55:50,785 [shard 0:main] osd - resolve_oid {56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]} does not contain 56 -- DNE
ERROR 2025-03-12 22:55:50,785 [shard 0:main] osd - pg_epoch 294 pg[3.0( v 280'385 lc 245'366 (0'0,280'385] local-lis/les=293/294 n=74 ec=18/18 lis/c=293/212 les/c/f=294/213/0 sis=293) [3,2,1] r=0 lpr=293 pi=[212,293)/2 crt=280'385 lcod 0'0 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: clone 3:0b68fe79:::smithi09435673-917:56 not found
WARN 2025-03-12 22:55:50,786 [shard 0:main] seastar - Exceptional future ignored: crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<2> >::throwable_carrier, backtrace: /lib64/libasan.so.8+0x82cbf 0xc6f5312 0xc6f5af0 0xc6f6be1 0xc6f75c6 0xbbeb7bb 0xbbebb38 0x36c406f 0x37de436 0x386cff3 0x38a45fd 0xbdee883 0xbdf9bfb 0xbeb3053 0xbeb4529 0xbb6c34f 0xbb6cb87 0x38b3bec /lib64/libc.so.6+0x295cf /lib64/libc.so.6+0x2967f 0x36979f4
_handle_pull_response uses load_and_lock_clone which tries to `resolve_oid` but is unable to and is returning "not found" with an error.
if (manager.options.resolve_clone) {
auto resolved_oid = resolve_oid(
manager.head_state.obc->get_head_ss(),
manager.target);
if (!resolved_oid) {
ERRORDPP("clone {} not found", dpp, manager.target);
co_await load_obc_iertr::future<>(
crimson::ct_error::enoent::make() <---
);
}
This error should be handled, in the line that was attached above, and cause the following assertion/abort:
if (pull_info.recovery_progress.first) {
auto obc_manager = pg.obc_loader.get_obc_manager(pull_info.recovery_info.soid);
co_await pg.obc_loader.load_and_lock(
obc_manager, RWState::RWNONE
).handle_error_interruptible(
crimson::ct_error::assert_all("unexpected error") <---
);
The assertion should have been resulted in a ceph_abort but I'm not sure why this is not the case.
Essentially we have 2(+1) issues here:
1) Why was the clone not found
2) Why we didn't actually ceph_abort
3) nit: _handle_pull_response LOG_PREFIX is wrong and is using "ReplicatedRecoveryBackend::handle_pull" instead
https://github.com/ceph/ceph-ci/blob/wip-matanb-crimson-only-backfill-scan/src/crimson/osd/replicated_recovery_backend.cc#L820
For 1)
I'm still looking into this, I want to check if the fact that we allowed reads on uncommited objects on the replica side could somehow explain this (e.g send "corrupted" data back to the primary?)
For 2)
I'm not clear why this is not handled - can this be intercepted?
I'm also not clear here. We do usually abort successfully on ceph_abort() - e.g https://tracker.ceph.com/issues/69439.
Perhaps something went wrong with:
a) "crimson::ct_error::assert_all"
b) our coroutine + handle_error_interruptible usage? (See test_crimson_coroutine.cc for how would we expect to handle exceptional futures)
I've found an earlier instance of the same issue (same object / same pg), this time without assertion even ignored (I'll keep looking into this):
DEBUG 2025-03-12 22:55:05,241 [shard 0:main] osd - pg_epoch 269 pg[3.0( v 261'375 lc 245'366 (0'0,261'375] local-lis/les=267/268 n=72 ec=18/18 lis/c=267/212 les/c/f=268/213/0 sis=267) [3,2,1] r=0 lpr=267 pi=[212,267)/1 crt=261'375 mlcod 0'0 active+recovering+degraded ReplicatedRecoveryBackend::handle_pull_response: MOSDPGPush(3.0 269/267 [PushOp(3:0b68fe79:::smithi09435673-917:56, version: 257'367, data_included: [595212~490444,1868800~458988,2887932~99144], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(3:0b68fe79:::smithi09435673-917:56@0'0, size: 3395212, copy_subset: [(0, 3395212)], clone_subset: {}, snapset: 56=[]:{56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]}, object_exist: false), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 2987076, data_complete: false, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: false, error: false))]) v4
DEBUG 2025-03-12 22:55:05,242 [shard 0:main] osd - pg_epoch 269 pg[3.0( v 261'375 lc 245'366 (0'0,261'375] local-lis/les=267/268 n=72 ec=18/18 lis/c=267/212 les/c/f=268/213/0 sis=267) [3,2,1] r=0 lpr=267 pi=[212,267)/1 crt=261'375 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: 3:0b68fe79:::smithi09435673-917:56 0
DEBUG 2025-03-12 22:55:05,244 [shard 0:main] osd - resolve_oid oid.snap=56,head snapset.seq=56
DEBUG 2025-03-12 22:55:05,244 [shard 0:main] osd - resolve_oid {56: [55, 53, 52, 51, 50, 4f, 4e, 4d, 4b, 4a, 49, 48, 45, 41, 40, 3f, 39, 31]} does not contain 56 -- DNE
ERROR 2025-03-12 22:55:05,244 [shard 0:main] osd - pg_epoch 269 pg[3.0( v 261'375 lc 245'366 (0'0,261'375] local-lis/les=267/268 n=72 ec=18/18 lis/c=267/212 les/c/f=268/213/0 sis=267) [3,2,1] r=0 lpr=267 pi=[212,267)/1 crt=261'375 mlcod 0'0 active+recovering+degraded ObjectContextLoader::load_and_lock_clone: clone 3:0b68fe79:::smithi09435673-917:56 not found
Updated by Matan Breizman 12 months ago
Possible explanation for not being able to resolve the clone object found in - https://tracker.ceph.com/issues/70639#note-2.
However, AFAICT, this could explain replica side only. While here the primary could not resolve the clone above.
Updated by Matan Breizman 12 months ago
- Related to Bug #70639: crimson: check for missing clones on replica read as well added
Updated by Matan Breizman 12 months ago · Edited
I think that https://github.com/ceph/ceph/pull/62288/commits/b214117050d77059b612374528ac44b05e84bfbf actually fixed the issue here:
According to the logs above, the primary got a push - meaning, the primary tried to pull the clone object.
Later on in _handle_pull_response, this primary tried to load_and_lock_clone - even though its missing in this primary.
With the fix above, we handle non-head differently, ssc is taken from pull_info.head_ctx->ssc and we skip the "load_and_lock" call that was problematic due to its resolve_oid dependency.
Updated by Matan Breizman 12 months ago · Edited
Matan Breizman wrote in #note-25:
The assertion should have been resulted in a ceph_abort but I'm not sure why this is not the case.
Essentially we have 2(+1) issues here:1) Why was the clone not found
2) Why we didn't actually ceph_abort
1) solved above
2) should be explained here: https://github.com/ceph/ceph/pull/62635
Keeping this tracker open to see if we have new occurrences
Updated by Matan Breizman 12 months ago · Edited
Matan Breizman wrote in #note-30:
Keeping this tracker open to see if we have new occurrences
Note instances are of thrashers/simple - hence, this is not related (shouldn't) to recovery. As these tests are not killing/reviving OSDs.
8226342
2025-04-06T14:51:03.811 INFO:tasks.rados.rados.0.smithi064.stdout:3997: finishing write tid 4 to smithi06435826-16 2025-04-06T14:51:03.813 INFO:tasks.rados.rados.0.smithi064.stdout:3997: finishing write tid 5 to smithi06435826-16 2025-04-06T14:51:03.813 INFO:tasks.rados.rados.0.smithi064.stdout:3997: oid 16 updating version 0 to 856 2025-04-06T14:51:03.813 INFO:tasks.rados.rados.0.smithi064.stdout:3997: oid 16 version 856 is already newer than 856 2025-04-06T14:51:03.813 INFO:tasks.rados.rados.0.smithi064.stdout:3997: oid 16 version 856 is already newer than 853 2025-04-06T14:51:03.814 INFO:tasks.rados.rados.0.smithi064.stdout:3997: oid 16 version 856 is already newer than 856 2025-04-06T14:51:03.814 INFO:tasks.rados.rados.0.smithi064.stdout:update_object_version oid 16 v 856 (ObjNum 1756 snap 562 seq_num 1756) dirty exists 2025-04-06T14:51:03.814 INFO:tasks.rados.rados.0.smithi064.stdout:3997: left oid 16 (ObjNum 1756 snap 562 seq_num 1756) 2025-04-06T14:51:03.814 INFO:tasks.rados.rados.0.smithi064.stdout:3997: done (5 left) 2025-04-06T14:51:03.814 INFO:tasks.rados.rados.0.smithi064.stdout: waiting on 5
8226378
2025-04-06T15:03:46.674 INFO:tasks.rados.rados.0.smithi071.stdout:1456: finishing write tid 4 to smithi07135955-40 2025-04-06T15:03:46.674 INFO:tasks.rados.rados.0.smithi071.stdout:1456: oid 40 updating version 0 to 330 2025-04-06T15:03:46.674 INFO:tasks.rados.rados.0.smithi071.stdout:1456: oid 40 version 330 is already newer than 328 2025-04-06T15:03:46.674 INFO:tasks.rados.rados.0.smithi071.stdout:1456: oid 40 version 330 is already newer than 330 2025-04-06T15:03:46.675 INFO:tasks.rados.rados.0.smithi071.stdout:update_object_version oid 40 v 330 (ObjNum 653 snap 195 seq_num 653) dirty exists 2025-04-06T15:03:46.675 INFO:tasks.rados.rados.0.smithi071.stdout:1456: left oid 40 (ObjNum 653 snap 195 seq_num 653) 2025-04-06T15:03:46.675 INFO:tasks.rados.rados.0.smithi071.stdout:1456: done (2 left) 2025-04-06T15:03:46.675 INFO:tasks.rados.rados.0.smithi071.stdout: waiting on 2
Note: hypothesis: In an attempt to read from replica due to https://github.com/ceph/ceph/pull/62535/commits/95255f7d80f3e2eb3d19f08bd23617c41dbda2a7 not merged yet,
we fail because of the incorrect served object?
We should still probably be able to jump back to primary on this scenario (by returning eagain).
Updated by Matan Breizman 12 months ago
Matan Breizman wrote in #note-31:
Note: hypothesis: In an attempt to read from replica due to https://github.com/ceph/ceph/pull/62535/commits/95255f7d80f3e2eb3d19f08bd23617c41dbda2a7 not merged yet,
we fail because of the incorrect served object?
We should still probably be able to jump back to primary on this scenario (by returning eagain).
Now that https://github.com/ceph/ceph/pull/62535 is merged, testing the relevant tests only (-N 3 and 4):
https://pulpito.ceph.com/matan-2025-04-08_08:34:23-crimson-rados-wip-matanb-crimson-replicated-missing-distro-crimson-smithi
https://pulpito.ceph.com/matan-2025-04-08_08:35:23-crimson-rados-wip-matanb-crimson-replicated-missing-distro-crimson-smithi
Updated by Matan Breizman 11 months ago
- Status changed from In Progress to Resolved
https://github.com/ceph/ceph/pull/62837 contained fixes that should have resolve this tracker.
Missing clone issue was fixed https://tracker.ceph.com/issues/70935
Error ignored - https://tracker.ceph.com/issues/70936
Moreover, assert_all is not aborting properly- https://github.com/ceph/ceph/pull/62699
Closing.