reef: src/osd/OSDMap.cc: Fix encoder to produce same bytestream#55712
reef: src/osd/OSDMap.cc: Fix encoder to produce same bytestream#55712
Conversation
Fixes: https://tracker.ceph.com/issues/63389 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit a3bdffb)
|
make check failed due to https://tracker.ceph.com/issues/62934 |
|
jenkins test make check |
|
We are seeing mon and osd crashes near the code changed in this PR when updating our ceph cluster from v18.2.1 to v18.2.2: Any chance this change accidentally introduced a regression there? |
|
The following steps appear to allow working around the issue:
|
|
@kamoltat: could you please take a look as well? |
|
Hmm, I read the code around the failed assertion (https://github.com/rzarzynski/ceph/blob/e9880fefda543d9d785dba89fe90c5e5074bb62a/src/osd/OSDMap.cc#L3242): void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
{
// ...
{
// NOTE: any new encoding dependencies must be reflected by
// SIGNIFICANT_FEATURES
uint8_t v = 10;
if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
v = 3;
} else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
v = 6;
} else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
v = 7;
} else if (!HAVE_FEATURE(features, SERVER_REEF)) {
v = 9;
}
// ...
if (v >= 4) {
encode(pg_upmap, bl);
encode(pg_upmap_items, bl);
} else {
ceph_assert(pg_upmap.empty());
ceph_assert(pg_upmap_items.empty());
}My understanding is the encoder was encoding for some very, very old decoder ( At the moment I don't see any obvious relationship with the commit. |
|
Sorry, I don't think I have any logs of the troubled situation. I have since applied the workaround and would need to recreate the broken state first to fetch logs. I expect the issue to be reproducible by spinning up a ceph cluster with 18.2.1, turning the balancer on, running operations that cause upmap entries to be generated and then attempting to upgrade the cluster to 18.2.2. You will then witness the 18.2.2 osds and mons to enter a crash loop while the upgrade is being rolled out. Not sure if the crash loop would stop once all nodes have been upgraded as the rollout got stuck half-way on our instance due to the cluster becoming unhealthy. |
|
I think you might be looking at the wrong assertion ( |
The change in I guess while decoding the feature-bitmask is received from the sender, which when running 18.2.1 does not have that bit set. Thus 18.2.2 interprets 18.2.1 data as non-reef and thus enter the assertion during decoding. |
|
So this basically means that this change causes incremental rollouts of upgrades from pre-18.2.2 reef versions to 18.2.2 to cause newly spawned services to enter a crash loop until all cluster osds and mons have been upgraded. I recommend posting a notice about such upgrades not being seamless together with above workaround. |
|
Yes, you're right! 18.2.0 and 18.2.1 lacks However, I don't see the link with if (v >= 4) {
encode(pg_upmap, bl);
encode(pg_upmap_items, bl);
} else {
ceph_assert(pg_upmap.empty());
ceph_assert(pg_upmap_items.empty());
}Have you observed any issue with just the upmapped primaries removed? I guess erasing upmapped items nor disabling the balancer is not need:
Impossible to disagree 👍. |
|
@DaDummy: just to summarize my perception on the conditions to run into the problem:
|
|
Hi @DaDummy, there is a tracker for this issue: https://tracker.ceph.com/issues/61948 Can you please update the tracker when you can with exact steps to reproduce the issue? I understand you:
Please let me know if I am missing any steps in this reproducer. |
|
(sorry I posted this with the wrong account before)
Oh I see. Yeah the workaround I posted might involve more steps than actually necessary. I encountered the error, found some suggestions online that seemed to make sense and after verifying that it looked sane just went with it. To clearly answer the question: No, I have not tried removing only the primaries but leaving the items there. Though I agree that the implementation looks like that should work.
These steps should reproduce the issue, though I have not verified that as I don't have a free test-cluster available atm:
Based on the conversation here I expect, that the crashes would stop once all ceph services are upgraded to 18.2.2. |
|
Thanks for looking into this, @DaDummy!
My understanding is there shouldn't be a decoder (in a service or in a client) for which 18.2.2 encoder would generate pre-Reef bytestream if somebody wants to use This stance isn't particularly useful, so let's try to clarify. With a bunch of extra assumptions to-be-checked, my thinking goes along these lines:
I will be refining this further. |
|
Looks like the problem may reemerge later even with only v18.2.2-services present. I just observed a mon starting to crash again while I was draining a different node that was also running a mon. Downgrading the mons back to 18.2.1 stopped the crashes. Curiously there are no upmap primaries listed in the output of Given that 18.2.1 was and is running without issue, I am tempted to stick to that release for now. |
|
I bet that among
while It should never connect to the cluster per the Sticking to v18.2.1 while not using |
backport tracker: https://tracker.ceph.com/issues/64406
backport of #55401
parent tracker: https://tracker.ceph.com/issues/63389
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh