Project

General

Profile

Actions

Bug #66294

closed

crimson: crash on osd startup while loading pgs (member access within null pointer of type 'struct __node_type' / open_collection)

Added by Samuel Just almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v19.3.0-3750-gee7c187025
Released In:
v20.2.0~2387
Upkeep Timestamp:
2025-11-01T01:18:00+00:00

Description

DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.1 to core 0 (primary): num_pgs 4
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 5.c to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.c to core 1 (primary): num_pgs 4
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 4.f to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.f to core 2 (primary): num_pgs 4
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 4.8 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.8 to core 0 (primary): num_pgs 5
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 4.0 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.0 to core 1 (primary): num_pgs 5
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 2.6 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.6 to core 2 (primary): num_pgs 5
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 4.a to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.a to core 0 (primary): num_pgs 6
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 2.5 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.5 to core 1 (primary): num_pgs 6
DEBUG 2024-05-29 19:50:29,407 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 2.1 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,408 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.1 to core 2 (primary): num_pgs 6
DEBUG 2024-05-29 19:50:29,408 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 2.4 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,408 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.4 to core 0 (primary): num_pgs 7
DEBUG 2024-05-29 19:50:29,408 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: calling primary to add mapping for pg 1.0 to the expected core 429
4967295
DEBUG 2024-05-29 19:50:29,408 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 1.0 to core 1 (primary): num_pgs 7
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.1 to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.b to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.e to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.e to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.f to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.3 to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.6 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.1 to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.1 to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.1 to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.5 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.a to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.a to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 1.0 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.0 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.1 to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.8 to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.d to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.2 to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.6 to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.5 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.a to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.a to core 0 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.f to core 2 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.0 to core 1 (others)
DEBUG 2024-05-29 19:50:29,408 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.5 to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.2 to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.2 to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 1.0 to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.f to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.e to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.c to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.c to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.b to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.a to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.3 to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.4 to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.f to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 4.8 to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.c to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.c to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.4 to core 0 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 1:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.6 to core 1 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 5.d to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 2:main] osd - PGShardMapping::get_or_create_pg_mapping: mapping pg 2.6 to core 2 (others)
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 2.1 mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 2.5 mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.5 mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.a mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 4.f mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 4.a mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.b mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.e mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,409 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.f mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 5.f
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.c mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 1.0 mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 5.b
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 4.0 mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.3 mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.d mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.2 mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 5.3
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.1 mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 5.1
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 4.8 mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 4.8
DEBUG 2024-05-29 19:50:29,410 [shard 0:main] osd - OSDSingletonState::load_pg: 4.a
DEBUG 2024-05-29 19:50:29,411 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 2.6 mapping to core 2 after broadcasted
DEBUG 2024-05-29 19:50:29,411 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 2.4 mapping to core 0 after broadcasted
DEBUG 2024-05-29 19:50:29,411 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.6 mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,411 [shard 0:main] osd - PGShardMapping::get_or_create_pg_mapping: returning pg 5.6 mapping to core 1 after broadcasted
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 4.f
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 5.e
DEBUG 2024-05-29 19:50:29,411 [shard 0:main] osd - OSDSingletonState::load_pg: 2.4
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 2.1
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 1.0
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 5.2
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 5.5
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 2.6
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 4.0
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 5.a
DEBUG 2024-05-29 19:50:29,411 [shard 2:main] osd - OSDSingletonState::load_pg: 5.d
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 2.5
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 5.c
DEBUG 2024-05-29 19:50:29,411 [shard 1:main] osd - OSDSingletonState::load_pg: 5.6
...
/opt/rh/gcc-toolset-13/root/usr/include/c++/13/bits/hashtable.h:1961:23: runtime error: member access within null pointer of type 'struct __node_type'
Segmentation fault on shard 1.
Backtrace:

https://pulpito.ceph.com/sjust-2024-05-29_18:54:15-crimson-rados:thrash-wip-sjust-crimson-testing-2024-05-28-distro-default-smithi/7732776/


Related issues 1 (0 open1 closed)

Related to crimson - Bug #66405: scrub_validator: runtime error: member access within null pointer of type 'const struct object'Resolvedjunxiang mu

Actions
Actions #1

Updated by Samuel Just almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by Matan Breizman almost 2 years ago

  • Related to Bug #66405: scrub_validator: runtime error: member access within null pointer of type 'const struct object' added
Actions #3

Updated by Samuel Just almost 2 years ago

sjust-2024-06-14_20:06:29-crimson-rados:thrash-wip-sjust-crimson-testing-2024-06-14-distro-default-smithi/7757008/

DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 2.0
DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 2.0
DEBUG 2024-06-14 20:38:24,225 [shard 2:main] osd - OSDSingletonState::load_pg: 3.b
DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 2.0
DEBUG 2024-06-14 20:38:24,225 [shard 2:main] osd - OSDSingletonState::load_pg: 3.b
DEBUG 2024-06-14 20:38:24,225 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-0).collection(2.4_head 0x6160000f6980)  r 0 v.len 33
DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 2.7
DEBUG 2024-06-14 20:38:24,225 [shard 2:main] osd - OSDSingletonState::load_pg: 3.2
DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 3.c
DEBUG 2024-06-14 20:38:24,225 [shard 1:main] osd - OSDSingletonState::load_pg: 3.d
...
Segmentation fault on shard 1.
Backtrace:
...
 0# std::__detail::_Hashtable_base<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::__detail::_Select1st, std::eq
ual_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Hashtable_traits<true, false, true>
>::_M_equals(coll_t const&, unsigned long, std::__detail::_Hash_node_value<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, tr
ue> const&) const in ceph-osd
 1# std::_Hashtable<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::int
rusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::
__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node(unsigned long,
coll_t const&, unsigned long) const in ceph-osd
 2# std::_Hashtable<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::int
rusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::
__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_node(unsigned long, coll_t
const&, unsigned long) const in ceph-osd
 3# std::_Hashtable<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::int
rusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::
__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(coll_t const&) in ceph-osd
...

Somewhat different details, but also a segfault while loading pgs -- probably the same cause.

Actions #4

Updated by Samuel Just almost 2 years ago

sjust-2024-06-14_20:06:29-crimson-rados:thrash-wip-sjust-crimson-testing-2024-06-14-distro-default-smithi/7757016

DEBUG 2024-06-14 20:50:25,950 [shard 2:main] osd - OSDSingletonState::load_pg: 2.4
DEBUG 2024-06-14 20:50:25,950 [shard 2:main] osd - OSDSingletonState::load_pg: 2.4
DEBUG 2024-06-14 20:50:25,950 [shard 0:main] bluestore - maybe_unpin 0x625000025900 #3:e0000000::::head# unpinned
DEBUG 2024-06-14 20:50:25,950 [shard 2:main] osd - OSDSingletonState::load_pg: 3.5
DEBUG 2024-06-14 20:50:25,950 [shard 2:main] osd - OSDSingletonState::load_pg: 3.5
DEBUG 2024-06-14 20:50:25,950 [shard 1:main] osd - OSDSingletonState::load_pg: 2.1
DEBUG 2024-06-14 20:50:25,950 [shard 2:main] osd - OSDSingletonState::load_pg: 3.b
DEBUG 2024-06-14 20:50:25,951 [shard 1:main] osd - OSDSingletonState::load_pg: 3.6
DEBUG 2024-06-14 20:50:25,951 [shard 1:main] osd - OSDSingletonState::load_pg: 3.e
DEBUG 2024-06-14 20:50:25,951 [shard 0:main] osd - OSDSingletonState::get_local_map: osdmap.257 found in cache
DEBUG 2024-06-14 20:50:25,976 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-3) omap_get_values 3.4_head oid #3:20000000::::head#
=================================================================
==31319==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000031370 at pc 0x000007e1662a bp 0x7ffeab26f120 sp 0x7ffeab26f110
WRITE of size 8 at 0x60b000031370 thread T0
Reactor stalled for 65 ms on shard 0. Backtrace: 0x6bddd 0xba3861b 0xb90b438 0xb90c9b4 0xb90cbca 0xb90cd20 0xb90d1e9 0x3e6ef 0x12c766 0x130eba 0x1331e6 0x133eba 0x11f834 0x11f955 0x10b7d9 0x10dfec 0x106210 0x1067ad 0x35190 0xe975f 0xe8c91 0xea61d 0x7e16629 0x7e38353 0x7e4e833 0x7d69859 0x7d69d50 0x7d69ec5 0x7d69f8c 0x7d6a06d 0x7e38708 0x7e38812 0x7e388ce 0x7e3898a 0x7e38a46 0x7e38b93 0x7e38d1e 0x7e390df 0x7e9f2f6 0xb91e0bd 0xb938357 0xb9d9355 0xb9da9a3 0xb6b7751 0xb6b80cd 0x363fba0 0x2958f 0x2963f 0x342e404
kernel callstack:
Reactor stalled for 123 ms on shard 0. Backtrace: 0x6bddd 0xba3861b 0xb90b438 0xb90c9b4 0xb90cbca 0xb90cd20 0xb90d1e9 0x3e6ef 0x131300 0x1331e6 0x133eba 0x11f834 0x11f955 0x10b7d9 0x10dfec 0x106210 0x1067ad 0x35190 0xe975f 0xe8c91 0xea61d 0x7e16629 0x7e38353 0x7e4e833 0x7d69859 0x7d69d50 0x7d69ec5 0x7d69f8c 0x7d6a06d 0x7e38708 0x7e38812 0x7e388ce 0x7e3898a 0x7e38a46 0x7e38b93 0x7e38d1e 0x7e390df 0x7e9f2f6 0xb91e0bd 0xb938357 0xb9d9355 0xb9da9a3 0xb6b7751 0xb6b80cd 0x363fba0 0x2958f 0x2963f 0x342e404
kernel callstack:
Reactor stalled for 230 ms on shard 0. Backtrace: 0x6bddd 0xba3861b 0xb90b438 0xb90c9b4 0xb90cbca 0xb90cd20 0xb90d1e9 0x3e6ef 0x119972 0x11ee14 0x1337f0 0x133c6d 0x1563e3 0xb6e9e0c 0x133f14 0x11f834 0x11f955 0x10b7d9 0x10dfec 0x106210 0x1067ad 0x35190 0xe975f 0xe8c91 0xea61d 0x7e16629 0x7e38353 0x7e4e833 0x7d69859 0x7d69d50 0x7d69ec5 0x7d69f8c 0x7d6a06d 0x7e38708 0x7e38812 0x7e388ce 0x7e3898a 0x7e38a46 0x7e38b93 0x7e38d1e 0x7e390df 0x7e9f2f6 0xb91e0bd 0xb938357 0xb9d9355 0xb9da9a3 0xb6b7751 0xb6b80cd 0x363fba0 0x2958f 0x2963f 0x342e404
kernel callstack:
    #0 0x7e16629 in std::_Hashtable<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_bucket_begin(unsigned long, std::__detail::_Hash_node<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, true>*) (/usr/bin/ceph-osd+0x7e16629) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)
    #1 0x7e38353 in std::_Hashtable<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_insert_unique_node(unsigned long, unsigned long, std::__detail::_Hash_node<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, true>*, unsigned long) (/usr/bin/ceph-osd+0x7e38353) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)
    #2 0x7e4e833 in std::__detail::_Map_base<coll_t, std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> >, std::allocator<std::pair<coll_t const, boost::intrusive_ptr<crimson::os::FuturizedCollection> > >, std::__detail::_Select1st, std::equal_to<coll_t>, std::hash<coll_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](coll_t const&) (/usr/bin/ceph-osd+0x7e4e833) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)
    #3 0x7d69859 in crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}::operator()(boost::intrusive_ptr<ObjectStore::CollectionImpl>) const (/usr/bin/ceph-osd+0x7d69859) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)
    #4 0x7d69d50 in seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > seastar::futurize<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >::invoke<crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) (/usr/bin/ceph-osd+0x7d69d50) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)
    #5 0x7d69ec5 in auto seastar::futurize_invoke<crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) (/usr/bin/ceph-osd+0x7d69ec5) (BuildId: f1fbe83829e0954bc4570eb7c127cf8000f0a5c1)

Also probably same cause.

Actions #5

Updated by Samuel Just over 1 year ago

For sure, we're accessing AlienStore::coll_map from multiple threads without a mutex -- that probably explains at least some of the above crashes. I don't actually see what purpose coll_map serves -- going to test a branch which just wraps the CollectionHandle from Bluestore in a CollectionRef and returns it. The only real impact on behavior it seems to have is that AlienStore::stop breaks the CollectionRef->CollectionHandle reference for all stored collections, but hopefully that isn't actually necessary.

Actions #6

Updated by Matan Breizman over 1 year ago

https://pulpito.ceph.com/matan-2024-07-11_12:17:07-crimson-rados-wip-matanb-seastar-july7-distro-crimson-smithi/

7796501 - osd.2

#8 0x7e5d71b in crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}::operator()(boost::intrusive_ptr<ObjectStore::CollectionImpl>) const (/usr/bin/ceph-osd+0x7e5d71b) (BuildId: 7f81f429606aeeec6fac7b90618342917ea45fb6)
SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.8+0xe0ea0) (BuildId: e72832baf1a219a1019b9ecbf8330cba69f7ad33) in operator delete(void*, unsigned long)

7796519 - osd.0

    #8 0x7e5d71b in crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#1}::operator()(boost::intrusive_ptr<ObjectStore::CollectionImpl>) const (/usr/bin/ceph-osd+0x7e5d71b) (BuildId: 7f81f429606aeeec6fac7b90618342917ea45fb6)" 
    SUMMARY: AddressSanitizer: double-free (/lib64/libasan.so.8+0xe0ea0) (BuildId: e72832baf1a219a1019b9ecbf8330cba69f7ad33) in operator delete(void*, unsigned long)
Actions #7

Updated by Matan Breizman over 1 year ago

  • Subject changed from crimson: crash on osd startup while loading pgs (member access within null pointer of type 'struct __node_type') to crimson: crash on osd startup while loading pgs (member access within null pointer of type 'struct __node_type' / open_collection)
Actions #8

Updated by Samuel Just over 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 58766
Actions #9

Updated by Matan Breizman over 1 year ago

  • Status changed from Fix Under Review to Resolved
Actions #10

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to ee7c187025807678a7880c5577cdc58c4381a590
  • Fixed In set to v19.3.0-3750-gee7c1870258
  • Upkeep Timestamp set to 2025-07-11T08:43:16+00:00
Actions #11

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3750-gee7c1870258 to v19.3.0-3750-gee7c187025
  • Upkeep Timestamp changed from 2025-07-11T08:43:16+00:00 to 2025-07-14T22:43:34+00:00
Actions #12

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2387
  • Upkeep Timestamp changed from 2025-07-14T22:43:34+00:00 to 2025-11-01T01:18:00+00:00
Actions

Also available in: Atom PDF