Bug #66316
closedAsyncReserver crash - !queue_pointers.count(item) && !in_progress.count(item)
0%
Description
...
ERROR 2024-05-31 03:39:33,444 [shard 0:main] none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4013-g149e0d82/rpm/el9/BUILD/ceph-19.0.0-4013-g149e0d82/src/common/AsyncReserver.h:264 : In function 'void AsyncReserver<T, F>::request_reservation(T, Context*, unsigned int, Context*) [with T = spg_t; F = crimso
n::osd::OSDSingletonState::DirectFinisher]', ceph_assert(%s)
!queue_pointers.count(item) && !in_progress.count(item)
INFO 2024-05-31 03:39:33,444 [shard 1:main] osd - pg_epoch 300 pg[3.c( v 280'496 (0'0,280'496] local-lis/les=290/291 n=5 ec=16/16 lis/c=290/290 les/c/f=291/291/0 sis=297) [] r=-1 lpr=300 pi=[290,297)/1 crt=280'496 lcod 0'0 mlcod 0'0 unknown NOTIFY exit Started/Stray 0.000149 0 0.000000
Aborting on shard 0.
Backtrace:
...
0# 0x00007FC15748B94C in /lib64/libc.so.6
1# raise in /lib64/libc.so.6
2# abort in /lib64/libc.so.6
3# ceph::__ceph_assert_fail(ceph::assert_data const&) in ceph-osd
4# AsyncReserver<spg_t, crimson::osd::OSDSingletonState::DirectFinisher>::request_reservation(spg_t, Context*, unsigned int, Context*) in ceph-osd
5# crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}::operator()(crimson::osd::OSDSingletonState&, Context*, Context*) const in ceph-osd
6# void std::__invoke_impl<void, crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}, crimson::osd::OSDSingletonState&, Context*, Context*>(std::__invoke_other, crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Conte
xt*)#1}&&, crimson::osd::OSDSingletonState&, Context*&&, Context*&&) in ceph-osd
7# decltype(auto) std::__apply_impl<crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}, std::tuple<crimson::osd::OSDSingletonState&, Context*, Context*>, 0ul, 1ul, 2ul>(crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Contex
t*, Context*)#1}&&, std::tuple<crimson::osd::OSDSingletonState&, Context*, Context*>&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul>) in ceph-osd
8# seastar::sharded<crimson::osd::OSDSingletonState>::invoke_on<crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}, Context*, Context*, seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(cri
mson::osd::OSDSingletonState&, Context*, Context*)#1}&&, Context*&&, Context*&&)::{lambda()#1}::operator()() in ceph-osd
9# seastar::future<void> seastar::futurize<void>::invoke<seastar::sharded<crimson::osd::OSDSingletonState>::invoke_on<crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}, Context*, Context*, seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, crimson::osd::ShardServices::local_request_reservation
(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}&&, Context*&&, Context*&&)::{lambda()#1}&>(crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}&&) in ceph-osd
10# seastar::smp_message_queue::async_work_item<seastar::sharded<crimson::osd::OSDSingletonState>::invoke_on<crimson::osd::ShardServices::local_request_reservation(spg_t, Context*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}, Context*, Context*, seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, crimson::osd::ShardServices::local_request_reservation(spg_t, Co
ntext*, unsigned int, Context*)::{lambda(crimson::osd::OSDSingletonState&, Context*, Context*)#1}&&, Context*&&, Context*&&)::{lambda()#1}>::run_and_dispose() in ceph-osd
11# 0x000000000B7CA260 in ceph-osd
12# 0x000000000B7E44FA in ceph-osd
13# 0x000000000B8854F8 in ceph-osd
14# 0x000000000B886B46 in ceph-osd
15# 0x000000000B5637C2 in ceph-osd
16# 0x000000000B56413E in ceph-osd
17# main in ceph-osd
Updated by Samuel Just almost 2 years ago
- Subject changed from crimson: AsyncReserver crash on interval change to crimson: AsyncReserver crash
Updated by Samuel Just almost 2 years ago
This is probably happening because the PG::request_local_background_io_reservation and friends do not wait for the future to resolve. Fortunately, these days we handle peering events under seastar::thread (see PG::do_peering_event), so it should be ok to simply block.
Updated by Samuel Just almost 2 years ago
sjust-2024-06-14_20:06:29-crimson-rados:thrash-wip-sjust-crimson-testing-2024-06-14-distro-default-smithi/7757018/osd_logs
Updated by Matan Breizman almost 2 years ago
- Subject changed from crimson: AsyncReserver crash to AsyncReserver crash - !queue_pointers.count(item) && !in_progress.count(item)
Updated by Matan Breizman over 1 year ago ยท Edited
Matan Breizman wrote in #note-4:
OSD.1
Updated by Samuel Just over 1 year ago
- Status changed from New to Fix Under Review
Updated by Samuel Just over 1 year ago
The above fix is part of the story, but the other half is that continuations submitted to the singleton instance need to be executed in order (in particular, map advances can cause a reservation cancel and requeue in the same peering event sequence). Will come up with a mechanism to ensure that next.
Updated by Samuel Just over 1 year ago
Updated PR with (hopefully) complete fix.
Updated by Matan Breizman over 1 year ago
- Status changed from Fix Under Review to Resolved
No new instances since fix is merged
Updated by Upkeep Bot 8 months ago
- Merge Commit set to d8121767596a8ced6cd21ca37d4cc971cc9690b6
- Fixed In set to v19.3.0-3744-gd8121767596
- Upkeep Timestamp set to 2025-07-11T01:38:27+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3744-gd8121767596 to v19.3.0-3744-gd812176759
- Upkeep Timestamp changed from 2025-07-11T01:38:27+00:00 to 2025-07-14T22:43:28+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2388
- Upkeep Timestamp changed from 2025-07-14T22:43:28+00:00 to 2025-11-01T01:17:55+00:00