Bug #70390: Squid deployed OSDs are crashing. - bluestore - Ceph

Actions

Copy link

Bug #70390

closed

Squid deployed OSDs are crashing.

Added by Igor Fedotov about 1 year ago. Updated about 1 month ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Target version:

% Done:

Source:

Backport:

squid, tentacle

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v19.0.0

ceph-qa-suite:

Pull request ID:

65065

Tags (freeform):

backport_processed

Merge Commit:

5e722409d9b89d8f38ad072e4ba0e962c75c9231

Fixed In:

v20.3.0-3593-g5e722409d9

Released In:

Upkeep Timestamp:

2025-10-14T15:08:31+00:00

Description

Two different backtraces look relevant:

"assert_condition": "!ito->is_valid()",
    "assert_func": "void BlueStore::Blob::copy_extents_over_empty(ceph::common::CephContext*, const BlueStore::Blob&, uint32_t, uint32_t)",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/BlueStore.cc",
    "assert_line": 2625,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::Blob::copy_extents_over_empty(ceph::common::CephContext*, const BlueStore::Blob&, uint32_t, uint32_t)' thread 7fe33b572640 time 2025-03-06T23:24:44.900787+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/BlueStore.cc: 2625: FAILED ceph_assert(!ito->is_valid())\n",
    "backtrace": [
        "/lib64/libc.so.6(+0x3e930) [0x7fe359471930]",
        "/lib64/libc.so.6(+0x8bfdc) [0x7fe3594befdc]",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16c) [0x55a3c1489302]",
        "/usr/bin/ceph-osd(+0x3fa463) [0x55a3c1489463]",
        "/usr/bin/ceph-osd(+0x3db06e) [0x55a3c146a06e]",
        "(BlueStore::Blob::copy_from(ceph::common::CephContext*, BlueStore::Blob const&, unsigned int, unsigned int, unsigned int)+0x13d) [0x55a3c19faaed]",
        "(BlueStore::ExtentMap::dup_esb(BlueStore*, BlueStore::TransContext*, boost::intrusive_ptr&lt;BlueStore::Collection&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, unsigned long&, unsigned long&, unsigned long&)+0x93e) [0x55a3c1a0be6e]",
        "(BlueStore::_do_clone_range(BlueStore::TransContext*, boost::intrusive_ptr&lt;BlueStore::Collection&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, unsigned long, unsigned long, unsigned long)+0x1b0) [0x55a3c1a6ac20]",
        "(BlueStore::_clone_range(BlueStore::TransContext*, boost::intrusive_ptr&lt;BlueStore::Collection&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, boost::intrusive_ptr&lt;BlueStore::Onode&gt;&, unsigned long, unsigned long, unsigned long)+0x288) [0x55a3c1a79868]",
        "(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x159d) [0x55a3c1a810dd]",
        "(BlueStore::queue_transactions(boost::intrusive_ptr&lt;ObjectStore::CollectionImpl&gt;&, std::vector&lt;ceph::os::Transaction, std::allocator&lt;ceph::os::Transaction&gt; >&, boost::intrusive_ptr&lt;TrackedOp&gt;, ThreadPool::TPHandle*)+0x303) [0x55a3c1a62703]",
        "(ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr&lt;OpRequest&gt;, ECSubWrite&, ZTracer::Trace const&, ECListener&)+0xac7) [0x55a3c198dfc7]",

---------------------------------------------

 "assert_condition": "diff <= bytes_per_au[pos]",
    "assert_func": "bool bluestore_blob_use_tracker_t::put(uint32_t, uint32_t, PExtentVector*)",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc",
    "assert_line": 511,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc: In function 'bool bluestore_blob_use_tracker_t::put(uint32_t, uint32_t, PExtentVector*)' thread 7f22d1e51640 time 2025-03-10T02:54:07.424953+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc: 511: FAILED ceph_assert(diff <= bytes_per_au[pos])\n",
    "backtrace": [
        "/lib64/libc.so.6(+0x3e930) [0x7f22f0551930]",
        "/lib64/libc.so.6(+0x8bfdc) [0x7f22f059efdc]",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16c) [0x563dcc0c9302]",
        "/usr/bin/ceph-osd(+0x3fa463) [0x563dcc0c9463]",
        "/usr/bin/ceph-osd(+0x3ea70c) [0x563dcc0b970c]",
        "(BlueStore::Blob::put_ref(BlueStore::Collection*, unsigned int, unsigned int, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> >*)+0xaa) [0x563dcc63085a]",
        "(BlueStore::OldExtent::create(boost::intrusive_ptr<BlueStore::Collection>, unsigned int, unsigned int, unsigned int, boost::intrusive_ptr<BlueStore::Blob>&)+0x11d) [0x563dcc641e7d]",
        "(BlueStore::ExtentMap::punch_hole(boost::intrusive_ptr<BlueStore::Collection>&, unsigned long, unsigned long, boost::intrusive::list<BlueStore::OldExtent, boost::intrusive::member_hook<BlueStore::OldExtent, boost::intrusive::list_member_hook<>, &BlueStore::OldExtent::old_extent_item> >*)+0x451) [0x563dcc6439c1]",
        "(BlueStore::_do_write_big(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x458) [0x563dcc6b10c8]",
        "(BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x100) [0x563dcc6b2f20]",
        "(BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x273) [0x563dcc6b85d3]",
        "(BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x98) [0x563dcc6b91e8]",
        "(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x120c) [0x563dcc6c0d4c]",
        "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x303) [0x563dcc6a2703]",
        "(ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, ECListener&)+0xac7) [0x563dcc5cdfc7]"

Files

Download all files

fsck46-reduced.log (75.7 KB) fsck46-reduced.log	fsck log for crashing OSD.46	Igor Fedotov, 03/11/2025 06:58 AM
fsck49-reduced.log (393 KB) fsck49-reduced.log	fsck log for crashing OSD.49	Igor Fedotov, 03/11/2025 06:58 AM

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Updated by Igor Fedotov about 1 year ago

Similar case has been reported at https://www.reddit.com/r/ceph/comments/1j07rwr/got_4_new_disks_all_4_have_the_same_issue/

Actions

Copy link

Updated by Igor Fedotov about 1 year ago

Severity changed from 3 - minor to 1 - critical
Affected Versions v19.0.0 added

Actions

Copy link Download all files

Updated by Igor Fedotov about 1 year ago

File fsck46-reduced.log fsck46-reduced.log added
File fsck49-reduced.log fsck49-reduced.log added

Actions

Copy link

Updated by Igor Fedotov about 1 year ago · Edited

A few notes form the field:
The issue has been observed for new OSDs deployed with Squid only.
Previously deployed OSDs are running fine under Squid.
Highly likely the issue is bound to Elastic Shared Blob implementation brought by https://github.com/ceph/ceph/pull/53178 and companions.

ceph-bluestore-tool's repair command is unable to fix the issue.
Both known cases use EC pool on top of all-flash OSDs.

Actions

Copy link

Updated by Adam Kupczyk about 1 year ago

Description updated (diff)

Actions

Copy link

Updated by Igor Fedotov about 1 year ago

Subject changed from Squid deployed OSD are crashing. to Squid deployed OSDs are crashing.

Actions

Copy link

Updated by Igor Fedotov about 1 year ago · Edited

And one more, a bit different, backtrace:

    "assert_condition": "diff <= bytes_per_au[pos]",
    "assert_func": "bool bluestore_blob_use_tracker_t::put(uint32_t, uint32_t, PExtentVector*)",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc",
    "assert_line": 511,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc: In function 'bool bluestore_blob_use_tracker_t::put(uint32_t, uint32_t, PExtentVector*)' thread 7fa23a8d6640 time 2025-03-08T18:02:56.716265+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.1/rpm/el9/BUILD/ceph-19.2.1/src/os/bluestore/bluestore_types.cc: 511: FAILED ceph_assert(diff <= bytes_per_au[pos])\n",
    "backtrace": [
        "/lib64/libc.so.6(+0x3e930) [0x7fa25efe2930]",
        "/lib64/libc.so.6(+0x8bfdc) [0x7fa25f02ffdc]",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16c) [0x555816e86302]",
        "/usr/bin/ceph-osd(+0x3fa463) [0x555816e86463]",
        "/usr/bin/ceph-osd(+0x3ea70c) [0x555816e7670c]",
        "(BlueStore::Blob::put_ref(BlueStore::Collection*, unsigned int, unsigned int, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_pextent_t> >*)+0xaa) [0x5558173ed85a]",
        "(BlueStore::OldExtent::create(boost::intrusive_ptr<BlueStore::Collection>, unsigned int, unsigned int, unsigned int, boost::intrusive_ptr<BlueStore::Blob>&)+0x11d) [0x5558173fee7d]",
        "(BlueStore::ExtentMap::punch_hole(boost::intrusive_ptr<BlueStore::Collection>&, unsigned long, unsigned long, boost::intrusive::list<BlueStore::OldExtent, boost::intrusive::member_hook<BlueStore::OldExtent, boost::intrusive::list_member_hook<>, &BlueStore::OldExtent::old_extent_item> >*)+0x451) [0x5558174009c1]",
        "(BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >*)+0x205) [0x555817474395]",
        "(BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0xc1) [0x555817478451]",
        "(BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7f) [0x555817479d7f]",

Actions

Copy link

Updated by Igor Fedotov 12 months ago · Edited

It's been confirmed that setting "bluestore_elastic_shared_blobs" to 0 prior to creating a new OSD works around the issue. Hence I'd recommend to run "ceph config set osd bluestore_elastic_shared_blobs 0" on any Squid cluster for now if someone is planning to add new OSDs to that cluster.

Unfortunately this wouldn't help OSDs which have been already deployed in Squid - the only know way to fix the bug for them is to redeploy.

Actions

Copy link

Updated by Igor Fedotov 12 months ago

One more upstream case with apparently the same issue: https://ceph-storage.slack.com/archives/C1HFU4JK1/p1737571240721649

Actions

Copy link

#10

Updated by Ponnuvel P 11 months ago

Would it make sense to disable ESB by default until this is resolved?

There's a hotfix release currently in progress (19.2.2) for Squid:
https://tracker.ceph.com/issues/70822

It might be possible to disable bluestore_elastic_shared_blobs in that release.

While this doesn't seem to affect pre-Squid OSDs, this could be particular painful to recreate all the OSDs in clusters that are Squid from the start.

Actions

Copy link

#11

Updated by Ponnuvel P 11 months ago

Created PR against main to disable it: https://github.com/ceph/ceph/pull/62724

If others agrees that the same needs to be done for Squid (19.2.2 or not) until this is fixed, I can backport the same to Squid too.

Actions

Copy link

#12

Updated by Maxim Sklenář 11 months ago

Hello, this has just happened to me too across three physical servers and I am also facing data loss, is there any way to recover the OSDs? And how can I prevent it happening to other OSDs? My whole cluster was deployed on 19.2.0, so my understanding is that it can happen again.

Actions

Copy link

#13

Updated by Maxim Sklenář 11 months ago

So it took me a day to recover the data from the crashed OSDs. I kept redeploying each one, and like 5% of the time, it actually didn't crash and stayed up, so i was able to recover the data. Keeping two of them running was even more challenging, as usually when i redeployed second one, all of the others had crashed again. After an hour of trying i got two of the broken ones to run at a same time and was finally able to recover all the data. Fast forward to few hours later when remapping was about to finish, 4th OSD met the same fate. As of right now, I have removed the 4 OSDs and no other failure has occurred. I have migrated as much data off the cluster as possible. Now what, do I recreate all OSDs? Like purge and then recreate? Can I be certain that this wouldn't occur again setting running bluestore_elastic_shared_blobs to 0?

Some data:

cephadm deplyoed cluster on 19.2.0, updated to 19.2.1
72 HDD OSDs accross three physical servers
I have both EC and normal pools on the affected OSDs

With the exact trigger of the crashes unknown, and potentially every fresh squid OSD being affected, I personally feel like this bug is a ticking bomb and might cause a lot of damage.

Actions

Copy link

#14

Updated by serge smirnov 11 months ago

Greetings gentlemen.

We made a cluster update from reef to squid 19.2.1 in Feb 2025,
cluster is rbd - data on EC hdd pool / meta on ssd replicated pool.

Today faced a similar problem - three osds from different hosts started to crash, looking into trace and found similar as described above -

     0> 2025-04-14T09:23:47.059+0000 71ab4ba00640 -1 *** Caught signal (Aborted) **
 in thread 71ab4ba00640 thread_name:

 ceph version 19.2.1 (58a7fab8be0a062d730ad7da874972fd3fba59fb) squid (stable)
 1: /lib64/libc.so.6(+0x3e930) [0x71ab6c739930]
 2: /lib64/libc.so.6(+0x8bfdc) [0x71ab6c786fdc]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16c) [0x56262956d302]
 6: /usr/bin/ceph-osd(+0x3fa463) [0x56262956d463]
 7: /usr/bin/ceph-osd(+0x3ea70c) [0x56262955d70c]
 8: (BlueStore::Blob::put_ref(BlueStore::Collection*, unsigned int, unsigned int, std::vector<bluestore_pextent_t, mempool::pool_allocator<(mempool::pool_index_t)5, bluestore_p
extent_t> >*)+0xaa) [0x562629ad485a]
 9: (BlueStore::OldExtent::create(boost::intrusive_ptr<BlueStore::Collection>, unsigned int, unsigned int, unsigned int, boost::intrusive_ptr<BlueStore::Blob>&)+0x11d) [0x56262
9ae5e7d]
 10: (BlueStore::ExtentMap::punch_hole(boost::intrusive_ptr<BlueStore::Collection>&, unsigned long, unsigned long, boost::intrusive::list<BlueStore::OldExtent, boost::intrusive
::member_hook<BlueStore::OldExtent, boost::intrusive::list_member_hook<>, &BlueStore::OldExtent::old_extent_item> >*)+0x451) [0x562629ae79c1]
 11: (BlueStore::_do_truncate(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, std::set<BlueStore
::SharedBlob*, std::less<BlueStore::SharedBlob*>, std::allocator<BlueStore::SharedBlob*> >*)+0x205) [0x562629b5b395]
 12: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0xc1) [0x562629b5f451]
 13: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x7f) [0x562629b60d7f]
 14: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x12a6) [0x562629b64de6]
 15: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intr
usive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x303) [0x562629b46703]
 16: /usr/bin/ceph-osd(+0x51ab08) [0x56262968db08]
 17: (ECBackend::ECRecoveryBackend::commit_txn_send_replies(ceph::os::Transaction&&, std::map<int, MOSDPGPushReply*, std::less<int>, std::allocator<std::pair<int const, MOSDPGP
ushReply*> > >)+0x13a) [0x562629a68d4a]
 18: (ECBackend::RecoveryBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x11b7) [0x562629a6c197]
 19: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1f7) [0x562629a76317]
 20: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x56) [0x562629886af6]
 21: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x81d) [0x5626297cfecd]
 22: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x197) [0x56262970a987]
 23: (ceph::osd::scheduler::PGRecoveryMsg::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x198) [0x562629954818]
 24: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xcd0) [0x562629724c80]
 25: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x2aa) [0x562629c2524a]
 26: /usr/bin/ceph-osd(+0xab2804) [0x562629c25804]
 27: /lib64/libc.so.6(+0x8a292) [0x71ab6c785292]
 28: /lib64/libc.so.6(+0x10f300) [0x71ab6c80a300]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Do I understand correctly that this is treated only with redeploying of all osds?

What are the triggers of this problem and how big is the scale? Do we need to urgently redeploy all osds where EC is used?

Only EC RBD clusters affected or EC RGW or all EC setups (with a naked RADOS)?

Actions

Copy link

#15

Updated by Maxim Sklenář 11 months ago

Also here is my error log

https://pastebin.com/EB8tWNp6

Actions

Copy link

#16

Updated by Igor Fedotov 11 months ago

serge smirnov wrote in #note-14:

Do I understand correctly that this is treated only with redeploying of all osds?

As far as I know this could happen to OSDs deployed in squid only. Does this statement match your experience? Or you can see old OSD failing the same way?

Hence if my statement is correct those new OSDs require redeployment with the disable flag only.

What are the triggers of this problem and how big is the scale? Do we need to urgently redeploy all osds where EC is used?

As I mentioned the remaining OSDs shouldn't be affected. Please let me know if my vision on this issue is wrong.

Hopefully Adam will share his findings shortly.

Only EC RBD clusters affected or EC RGW or all EC setups (with a naked RADOS)?

We haven't investigated that way. Likely any EC setups with data overwrites is affected

Actions

Copy link

#17

Updated by Adam Kupczyk 11 months ago

Status changed from New to Fix Under Review
Pull request ID set to 62816

Actions

Copy link

#18

Updated by Adam Kupczyk 11 months ago

List of fsck obtained by algorithm from https://github.com/ceph/ceph/pull/62817 (but without code changes)

2025-04-14T07:16:58.876+0000 7f33a5422cc0 20 bluestore(dev/osd0) fsck_check_objects_shallow    shard 0x1b0000(0x301 bytes)
2025-04-14T07:16:58.876+0000 7f33a5422cc0 20 bluestore(dev/osd0) fsck_check_objects_shallow    shard 0x1da000(0x228 bytes)
2025-04-14T07:16:58.876+0000 7f33a5422cc0 20 bluestore(dev/osd0) fsck_check_objects_shallow    shard 0x1fb000(0x193 bytes)
.....
2025-04-14T07:16:58.877+0000 7f33a5422cc0 20 bluestore(dev/osd0) fsck_check_objects_shallow    0x209000~3000: 0x6000~3000 Blob(0x55827a701f00 blob([!~6000,0x1d4804f000~1000,0x1d5caeb000~1000,0x1d31df8000~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[0,0,0,0,0,0,1000,1000,2000]) (shared_blob=NULL))
2025-04-14T07:16:58.877+0000 7f33a5422cc0 20 bluestore(dev/osd0) fsck_check_objects_shallow    0x20b000~1000: 0x8000~1000 Blob(0x55827a701f00 blob([!~6000,0x1d4804f000~1000,0x1d5caeb000~1000,0x1d31df8000~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[0,0,0,0,0,0,1000,1000,2000]) (shared_blob=NULL))
2025-04-14T07:16:58.877+0000 7f33a5422cc0 -1 bluestore(dev/osd0) fsck error: 2#2:43e6ecc7:::stress_obj.4.1:head# lextent at 0x20b000 overlaps with the previous, which ends at 0x20c000

2025-04-14T09:30:20.970+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x183000(0x32c bytes)
2025-04-14T09:30:20.970+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1b0000(0x37e bytes)
2025-04-14T09:30:20.970+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1dc000(0x2e7 bytes)
2025-04-14T09:30:20.970+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x0~5000: 0x0~5000 Blob(0x55
....
2025-04-14T09:30:20.970+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x1d1000~4000: 0x1000~4000 Blob(0x55a83b043280 blob([!~1000,0x1a9afbd000~1000,0x2ce866f000~2000,0x2c10811000~1000,!~3000] llen=0x8000 csum+shared crc32c/0x1000/32) use_tracker(0x8*0x1000 0x[0,1000,2000,2000,2000,0,0,0]) SharedBlob(0x55a80e0c16e0 sbid 0xe7016e))
2025-04-14T09:30:20.971+0000 7efcad211cc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x1d2000~3000: 0x2000~3000 Blob(0x55a83b043280 blob([!~1000,0x1a9afbd000~1000,0x2ce866f000~2000,0x2c10811000~1000,!~3000] llen=0x8000 csum+shared crc32c/0x1000/32) use_tracker(0x8*0x1000 0x[0,1000,2000,2000,2000,0,0,0]) SharedBlob(0x55a80e0c16e0 sbid 0xe7016e))
2025-04-14T09:30:20.971+0000 7efcad211cc0 -1 bluestore(dev/osd5) fsck error: 0#2:4904c7e8:::stress_obj.4.286:head# lextent at 0x1d2000 overlaps with the previous, which ends at 0x1d5000

2025-04-14T13:42:25.523+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1a7000(0x277 bytes)
2025-04-14T13:42:25.523+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1cd000(0x4c bytes)
2025-04-14T13:42:25.523+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1d7000(0x200 bytes)
2025-04-14T13:42:25.523+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    shard 0x1f7000(0x61 bytes)
2025-04-14T13:42:25.523+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x0~4000: 0x0~4000 Blo
....
2025-04-14T13:42:25.524+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x1f4000~4000: 0x4000~4000 Blob(0x55d0d787ff00 spanning 0 blob([0x486972000~1000,!~3000,0x146c64000~1000,0x498865000~1000,0x56a625000~1000,0x4b5069000~1000,!~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[1000,0,0,0,1000,1000,1000,1000,0]) (shared_blob=NULL))
2025-04-14T13:42:25.524+0000 7f17353abcc0 -1 bluestore(dev/osd5) fsck error: 2#2:712d9f89:::stress_obj.0.2:head# lextent at 0x1f4000~4000 spans a shard boundary
2025-04-14T13:42:25.524+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x1f7000~1000: 0x7000~1000 Blob(0x55d0d787ff00 spanning 0 blob([0x486972000~1000,!~3000,0x146c64000~1000,0x498865000~1000,0x56a625000~1000,0x4b5069000~1000,!~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[1000,0,0,0,1000,1000,1000,1000,0]) (shared_blob=NULL))
2025-04-14T13:42:25.524+0000 7f17353abcc0 -1 bluestore(dev/osd5) fsck error: 2#2:712d9f89:::stress_obj.0.2:head# lextent at 0x1f7000 overlaps with the previous, which ends at 0x1f8000
2025-04-14T13:42:25.524+0000 7f17353abcc0 20 bluestore(dev/osd5) fsck_check_objects_shallow    0x1f8000~8000: 0x1000~8000 Blob(0x55d0d75c6800 blob([!~1000,0x41cb58000~1000,0x4ba0ca000~1000,0x4ea99c000~1000,0x151846000~1000,0x3cc48f000~1000,0x55e8e0000~2000,0x55e9e0000~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[0,1000,1000,1000,1000,1000,1000,1000,1000]) (shared_blob=NULL))
2025-04-14T13:42:25.524+0000 7f17353abcc0 -1 bluestore(dev/osd5) fsck error: 2#2:712d9f89:::stress_obj.0.2:head# blob Blob(0x55d0d787ff00 spanning 0 blob([0x486972000~1000,!~3000,0x146c64000~1000,0x498865000~1000,0x56a625000~1000,0x4b5069000~1000,!~1000] llen=0x9000 csum crc32c/0x1000/36) use_tracker(0x9*0x1000 0x[1000,0,0,0,1000,1000,1000,1000,0]) (shared_blob=NULL)) doesn't match expected ref_map use_tracker(0x9*0x1000 0x[1000,0,0,0,1000,1000,1000,2000,0])

Actions

Copy link

#19

Updated by Adam Kupczyk 11 months ago

Zoom on problems caught by code in https://github.com/ceph/ceph/pull/62817.

 -3923> 2025-04-14T17:24:40.402+0000 7fab08c25640 -1 bluestore.extentmap(0x556675ec0d88) FAILED ENCODE_SOME spans shard force=1:
0x556675ec0c00 2#2:2fc4df5f:::stress_obj.2.0:head# nid 1250 size 0x200000 (2097152) expected_object_size 0 expected_write_size 0 in 15 shards: 0x0 0x30000 0x50000 0x7f000 0xa4000 0xc3000 0xe0000 0xf7000 0x118000 0x13d000 0x165000 0x18c000 0x1a6000 0x1c0000 0x1cf000, 1 spanning blobs
....
0x1c7000~2000: 0x7000~2000 Blob(enoplil)
0x1c9000~5000: 0x9000~5000 Blob(safuxaf)
0x1ce000~2000: 0xe000~2000 Blob(enoplil) <---- extent sticks out or shard, spanning blob
0x1d0000~8000: 0x0~8000 Blob(wlixtkr)
0x1d8000~3000: 0x8000~3000 Blob(qlacuwe)
0x1db000~3000: 0xb000~3000 Blob(krtenur)
....
Blob(enoplil disk=0x[!7,5b9307~1,635141~1,!5,56844f~1,4e7ca2~1]000 track=16*4K [0,0,0,0,0,0,0,4K,4K,0,0,0,0,0,4K,4K] spanning.id=5)

2025-04-14T17:26:11.443+0000 7f1e55806640 -1 bluestore.extentmap(0x5611fe28bc88) FAILED ENCODE_SOME spans shard force=1:
0x5611fe28bb00 1#2:793267de:::stress_obj.1.0:head# nid 1309 size 0x200000 (2097152) expected_object_size 0 expected_write_size 0 in 17 shards: 0x0 0x10000 0x2b000 0x48000 0x6d000 0x86000 0xa8000 0xd0000 0xe7000 0x107000 0x12c000 0x150000 0x176000 0x190000 0x1b4000 0x1dd000 0x1e8000, 3 spanning blobs
....
0x172000~2000: 0x2000~2000 Blob(umignaj)
0x174000~8000: 0x5000~8000 Blob(govegrt) <---- extent sticks out of shard, spanning blob
0x17c000~2000: 0x0~2000 Blob(kowunof)
0x17e000~2000: 0x0~2000 Blob(owtflix)
....
Blob(govegrt disk=0x[!1,fbb304~1,81d7b6~1,!2,f9c658~1,f9d115~1,240f010~6]000 track=13*4K {4K}[0=0,3=0,4=0] spanning.id=2)

2025-04-14T18:24:26.823+0000 7f914944e640 -1 bluestore.extentmap(0x5558e6cdea88) FAILED ENCODE_SOME spans shard force=1:
0x5558e6cde900 1#2:2fc4df5f:::stress_obj.2.0:head# nid 1362 size 0x200000 (2097152) expected_object_size 0 expected_write_size 0 in 15 shards: 0x0 0x1f000 0x41000 0x6c000 0x8a000 0xa8000 0xce000 0xf9000 0x11f000 0x13d000 0x15b000 0x177000 0x1ab000 0x1d0000 0x1f8000, 1 spanning blobs
...
0x1c9000~6000: 0x8000~6000 Blob(koclmni)
0x1cf000~3000: 0x5000~3000 Blob(ipupesn)  <---- extent sticks out of shard, spanning blob
0x1d2000~4000: 0x2000~4000 Blob(klusop)
0x1d6000~4000: 0x6000~4000 Blob(qezuzupe)
...
Blob(ipupesn disk=0x[!5,9bc9c~1,6d1c4~1,91957~1,!7]000 track=15*4K {0}[5=4K,6=4K,7=4K] spanning.id=1) bufs(0x166000~1000 0x1c7000~2000,writing,nocache)

2025-04-14T18:30:09.936+0000 7f65a69d9640 -1 bluestore.extentmap(0x55dcaa6ea488) FAILED ENCODE_SOME spans shard force=1:
0x55dcaa6ea300 0#2:ba050e7b:::stress_obj.0.0:head# nid 1190 size 0x200000 (2097152) expected_object_size 0 expected_write_size 0 in 17 shards: 0x0 0x20000 0x46000 0x66000 0x80000 0x99000 0xae000 0xd3000 0xfb000 0x11b000 0x132000 0x14f000 0x16b000 0x190000 0x1b0000 0x1d7000 0x1fd000, 4 spanning blobs
...
0x12f000~1000: 0x5000~1000 Blob(yzidrde)
0x130000~8000: 0x0~8000 Blob(qugrthi)  <---- extent sticks out of shard, spanning blob
0x138000~1000: 0x0~1000 Blob(ocuvrli)
0x139000~1000: 0x3000~1000 Blob(gotwitf)
...
Blob(qugrthi disk=0x[6636bc~1,66405d~1,1d0ec07~6,!2,1d0e712~1,649c98~1]000 track=12*4K {4K}[8=0,9=0] spanning.id=1)

2025-04-14T19:15:56.687+0000 7f558af56640 -1 bluestore.extentmap(0x55d1d9e79088) FAILED ENCODE_SOME spans shard force=1:
0x55d1d9e78f00 0#2:2fc4df5f:::stress_obj.2.0:head# nid 1185 size 0x200000 (2097152) expected_object_size 0 expected_write_size 0 in 18 shards: 0x0 0x18000 0x31000 0x58000 0x82000 0xa2000 0xb8000 0xd7000 0xf6000 0x112000 0x130000 0x14f000 0x16d000 0x18a000 0x1af000 0x1cc000 0x1d2000 0x1f0000, 1 spanning blobs
0x0~1000: 0x0~1000 Blob(kebrfuz)
....
0x1cc000~5000: 0x0~5000 Blob(etojus)
0x1d1000~7000: 0x1000~7000 Blob(yekothr)  <---- extent sticks out of shard, spanning blob
0x1d8000~6000: 0x6000~6000 Blob(gelafln)
0x1de000~2000: 0xe000~2000 Blob(yekothr)
...
Blob(yekothr disk=0x[!1,b719b3~1,d01658~5,d0167e~1,!6,d01a75~2]000 track=16*4K [0,4K,4K,4K,4K,4K,4K,4K,0,0,0,0,0,0,4K,4K] spanning.id=0)

Actions

Copy link

#20

Updated by serge smirnov 11 months ago

Igor Fedotov wrote in #note-16:

serge smirnov wrote in #note-14:

Do I understand correctly that this is treated only with redeploying of all osds?

As far as I know this could happen to OSDs deployed in squid only. Does this statement match your experience? Or you can see old OSD failing the same way?

Igor, we see this issue on those OSDs that were previously on a reef and then upgraded to squid,
this isn't a squid deploy

Actions

Copy link

#21

Updated by Adam Kupczyk 11 months ago

@sergesmirnoff
Can you do fsck on that failed OSD and paste errors for the object that caused problems?

Actions

Copy link

#22

Updated by serge smirnov 11 months ago

Adam Kupczyk wrote in #note-21:

@sergesmirnoff
Can you do fsck on that failed OSD and paste errors for the object that caused problems?

please see below

2025-04-15T19:24:56.458+0000 7c4ae4e49b40 -1 bluestore(./osd.38) fsck error: 5#2:dcc98cb5:::rbd_data.3.ca0139e20242a7.000000000002c401:head# lextent at 0x1b000~2000 spans a shard boundary
2025-04-15T19:24:56.458+0000 7c4ae4e49b40 -1 bluestore(./osd.38) fsck error: 5#2:dcc98cb5:::rbd_data.3.ca0139e20242a7.000000000002c401:head# lextent at 0x1c000 overlaps with the previous, which ends at 0x1d000
2025-04-15T19:24:56.458+0000 7c4ae4e49b40 -1 bluestore(./osd.38) fsck error: 5#2:dcc98cb5:::rbd_data.3.ca0139e20242a7.000000000002c401:head# blob Blob(0x5c2e85f8e270 spanning 12 blob([0x80a54a3000~1000,0x23041a46000~1000,!~a000] llen=0xc000 csum crc32c/0x1000/48) use_tracker(0xc*0x1000 0x[1000,1000,0,0,0,0,0,0,0,0,0,0]) (shared_blob=NULL)) doesn't match expected ref_map use_tracker(0xc*0x1000 0x[1000,2000,0,0,0,0,0,0,0,0,0,0])
fsck status: remaining 3 error(s) and warning(s)

Actions

Copy link

#23

Updated by Igor Fedotov 11 months ago

Priority changed from Normal to Immediate

Actions

Copy link

#24

Updated by Laura Flores 9 months ago

Has duplicate Bug #71759: os/bluestore/BlueStore.cc: ceph_abort in BlueStore::ExtentMap::encode_some(...) added

Actions

Copy link

#25

Updated by Denis Polom 8 months ago

Hi,
have same issue on my cluster.
Upgraded from Pacific -> Reef -> Squid 19.2.2 - everything worked well until after some weeks I needed to add new OSDs. All new OSDs were crashing randomly with this error and repair wasn't able to fix it.

Actions

Copy link

#26

Updated by Neha Ojha 7 months ago

Backport set to squid, tentacle

Actions

Copy link

#27

Updated by Denis Polom 7 months ago

Hi @Neha Ojha, what is the PR number and when we can expect it to be released? It's quite crucial to us.

thx!

Actions

Copy link

#28

Updated by eric levesque 7 months ago

Hello,

Same here. We added new 60 OSD at the beginning of august before we know the bug :( in our rook-ceph cluster.

We are now applying the hot fix (ceph config set osd bluestore_elastic_shared_blobs false) to rebuild the faulty OSD. It's quit painfull.

Also, we have now two PG in incomplete state blocking I/O on one erasure coding (3+1) pool.

sh-4.4$ ceph health detail
    pg 16.19e is incomplete, acting [4,262,121,276] (reducing pool pool-trappes-ec min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 16.1e4 is creating+incomplete, acting [275,99,276,260] (reducing pool pool-trappes-ec min_size from 3 may help; search ceph.com/docs for 'incomplete')

So far we have not been able to fix the problem.

Kind regards

Actions

Copy link

#29

Updated by Bastien BALAUD 6 months ago

Hello,

We encounter this issue during a PG reduction (256->128) on EC pool (4-2) with data overwrite enable. We decide to disable the bluestore_elastic_shared_blobs and rebuild all the impacted OSD.
Regards

Actions

Copy link

#30

Updated by Maxim Sklenář 6 months ago

Hello,

can this still occur after disabling bluestore_elastic_shared_blobs and rebuilding all impacted OSDs? Can other OSDs created prior to disabling the options crash as well?

Thanks

Actions

Copy link

#31

Updated by Frédéric NASS 6 months ago

Can this still occur after disabling bluestore_elastic_shared_blobs and rebuilding all impacted OSDs?

No.

Can other OSDs created prior to disabling the options crash as well?

Yes.

Please read Igor's comment #8 above. The recommandation is to set 'ceph config set osd bluestore_elastic_shared_blobs 0' and to redeploy any OSDs created with bluestore_elastic_shared_blobs enabled (as per default in Squid) to avoid crashing OSDs.

Actions

Copy link

#32

Updated by Adam Kupczyk 5 months ago

Status changed from Fix Under Review to Pending Backport
Pull request ID changed from 62816 to 65065

changed PR from 62816 to 65065

Actions

Copy link

#33

Updated by Upkeep Bot 5 months ago

Merge Commit set to 5e722409d9b89d8f38ad072e4ba0e962c75c9231
Fixed In set to v20.3.0-3593-g5e722409d9
Upkeep Timestamp set to 2025-10-14T15:08:31+00:00

Actions

Copy link

#34

Updated by Upkeep Bot 5 months ago

Copied to Backport #73536: squid: Squid deployed OSDs are crashing. added

Actions

Copy link

#35

Updated by Upkeep Bot 5 months ago

Copied to Backport #73537: tentacle: Squid deployed OSDs are crashing. added

Actions

Copy link

#36

Updated by Upkeep Bot 5 months ago

Tags (freeform) set to backport_processed

Actions

Copy link

#37

Updated by Dan van der Ster 4 months ago

@Igor Fedotov question about the fix in https://github.com/ceph/ceph/pull/65065

Will OSDs created in v19.2.[0-3] now be immune to this crash? Or is it possible that the OSDs are already tainted in some way and can crash later?

Actions

Copy link

#38

Updated by Igor Fedotov 4 months ago

Dan van der Ster wrote in #note-37:

@Igor Fedotov question about the fix in https://github.com/ceph/ceph/pull/65065

Will OSDs created in v19.2.[0-3] now be immune to this crash?

Likely no. The patch eliminates the appearance of new corruptions but it doesn't guarantee their absence in existing onodes. One should fsck an OSD to make sure it's clean.

Thanks,
Igor

Actions

Copy link

#39

Updated by Marc Gariépy 3 months ago

please tell me if i miss something here:
to be affected, you need:

osd deployed with Squid
having `bluestore_elastic_shared_blobs enabled` for the newly deployed osd.
the pool need be be EC with `ec_overwrites` active.

if ec_overwrites is not active on a pool, it won't be affected.
it also won't affect replicated pools.

do i miss something ?

is there a reason why it wasn't put somewhere in the top of the releases notes page for squid?

Thanks.
Marc

Actions

Copy link

#40

Updated by Maxim Sklenář 2 months ago

Hello,

is my understanding correct, that the fix was merged only after the release of 20.2.0, while bluestore_elastic_shared_blobs is set to true? Does this mean that newly created OSDs in 20.2.0 are still susceptible to this issue?

Thanks

Actions

Copy link

#41

Updated by Prashant D about 2 months ago

In some cases, we still be able to recover data from affected (down/incomplete) PGs by exporting them from crashing OSDs (those hitting the diff <= bytes_per_au[pos] assertion) and importing them into healthy OSDs. We successfully helped one customer to recover data using this approach. However, to prevent this issue going forward, make sure to redeploy all affected OSDs after setting (as directed in comment#8)

$ ceph config set osd bluestore_elastic_shared_blobs 0

Actions

Copy link

#42

Updated by Igor Fedotov about 1 month ago

Status changed from Pending Backport to Resolved

Actions

Copy link

#43

Updated by Igor Fedotov 29 days ago

Has duplicate Bug #74726: tentacle: ceph_assert(!ito->is_valid()) added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Tags

Custom queries

Bug #70390

Squid deployed OSDs are crashing.

Updated by Igor Fedotov about 1 year ago

Updated by Igor Fedotov about 1 year ago

Updated by Igor Fedotov about 1 year ago

Updated by Igor Fedotov about 1 year ago · Edited

Updated by Adam Kupczyk about 1 year ago

Updated by Igor Fedotov about 1 year ago

Updated by Igor Fedotov about 1 year ago · Edited

Updated by Igor Fedotov 12 months ago · Edited

Updated by Igor Fedotov 12 months ago

Updated by Ponnuvel P 11 months ago

Updated by Ponnuvel P 11 months ago

Updated by Maxim Sklenář 11 months ago

Updated by Maxim Sklenář 11 months ago

Updated by serge smirnov 11 months ago

Updated by Maxim Sklenář 11 months ago

Updated by Igor Fedotov 11 months ago

Updated by Adam Kupczyk 11 months ago

Updated by Adam Kupczyk 11 months ago

Updated by Adam Kupczyk 11 months ago

Updated by serge smirnov 11 months ago

Updated by Adam Kupczyk 11 months ago

Updated by serge smirnov 11 months ago

Updated by Igor Fedotov 11 months ago

Updated by Laura Flores 9 months ago

Updated by Denis Polom 8 months ago

Updated by Neha Ojha 7 months ago

Updated by Denis Polom 7 months ago

Updated by eric levesque 7 months ago

Updated by Bastien BALAUD 6 months ago

Updated by Maxim Sklenář 6 months ago

Updated by Frédéric NASS 6 months ago

Updated by Adam Kupczyk 5 months ago

Updated by Upkeep Bot 5 months ago

Updated by Upkeep Bot 5 months ago

Updated by Upkeep Bot 5 months ago

Updated by Upkeep Bot 5 months ago

Updated by Dan van der Ster 4 months ago

Updated by Igor Fedotov 4 months ago

Updated by Marc Gariépy 3 months ago

Updated by Maxim Sklenář 2 months ago

Updated by Prashant D about 2 months ago

Updated by Igor Fedotov about 1 month ago

Updated by Igor Fedotov 29 days ago