Skip to content

src/osd/OSDMap.cc: Fix encoder to produce same bytestream#55401

Merged
rzarzynski merged 1 commit intoceph:mainfrom
kamoltat:wip-ksirivad-osdmap-encode-bug
Feb 21, 2024
Merged

src/osd/OSDMap.cc: Fix encoder to produce same bytestream#55401
rzarzynski merged 1 commit intoceph:mainfrom
kamoltat:wip-ksirivad-osdmap-encode-bug

Conversation

@kamoltat
Copy link
Member

@kamoltat kamoltat commented Jan 31, 2024

Problem:

The root cause was due to the OSDMap encoder of Reef (or higher) producing different byte streams than that of Quincy's. The problem might not be obvious when looking at just encoding/decoding point of view, however, the CRC value is calculated by looking at the buffer list size of the encoded objects. Therefore, the CRC values between Quincy and Reef are different.

When mons are running different versions, e.g.,
during upgrades, we sometimes will encounter a crc mismatch between the crc that is generated
from the leader MON (version n+1) and
the peon MON (version n). This is due to how the OSDMap encoding leaves a different byte stream between Quincy and Reef encoder.

Solution:

Fix the OSDMap encoding in later version such that
it will produce the same byte stream as version <= Quincy

Fixes: https://tracker.ceph.com/issues/63389

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@kamoltat kamoltat requested a review from a team as a code owner January 31, 2024 15:30
@github-actions github-actions bot added the core label Jan 31, 2024
@kamoltat kamoltat self-assigned this Jan 31, 2024
@athanatos
Copy link
Contributor

Your description of the problem does not seem to match the bug. The bug indicates that the OSDs were sending "failed to encode map X with expected crc". That problem happens due to OSDs with newer binaries encoding with their newest known version rather than the older version and does not seem obviously related to the upgrade state on the mons.

I'd like to see some kind of automated testing to prevent this in the future as well.

@athanatos
Copy link
Contributor

What testing has been done to validate this fix?

@athanatos
Copy link
Contributor

athanatos commented Jan 31, 2024

Apparently, the warning from the OSDs should have triggered a health warn in teuthology, but since we switched the deployment to cephadm, teuthology runs aren't writing the logs to the cluster log file and the grep for cluster warnings isn't working (and is thereby succeeding rather than failing). We need to fix that, confirm that it catches the currently broken implementation, and then use it to validate that this PR fixes the problem.

Bug: https://tracker.ceph.com/issues/63425 PR: #54312

@kamoltat
Copy link
Member Author

kamoltat commented Feb 1, 2024

I ran an initial validation:

https://pulpito.ceph.com/ksirivad-2024-01-31_21:14:43-upgrade:quincy-x-main-distro-default-smithi/7541233/ (without patch)

2024-01-31T21:48:31.745+0000 7f9405310700 -1 mon.c@2(peon).osd e260 update_from_paxos full map CRC mismatch, resetting to canonical

2024-01-31T21:48:31.745+0000 7f9405310700 20 mon.c@2(peon).osd e260 update_from_paxos my (bad) full osdmap:

https://pulpito.ceph.com/ksirivad-2024-01-31_19:49:38-upgrade:quincy-x-wip-ksirivad-osdmap-encode-bug-distro-default-smithi/7541224/ (with patch)
No CRC mismatch is seen

@kamoltat
Copy link
Member Author

kamoltat commented Feb 1, 2024

@athanatos

You're right essentially it is an OSD problem since the root cause is an OSDMap encoder, I'll change the PR description.

I was just talking about the Monitors being in charge of applying the increments when there is an OSDMap change, the problem happens once the peon (running in a different version) fetches the increment from MonDB that was encoded with a different encoder, so once the peon applies the increment to it's OSDMap, it gives a different CRC output triggering the check.

@kamoltat
Copy link
Member Author

kamoltat commented Feb 1, 2024

The next step is to validate that the fix is not breaking other things.

@athanatos
Copy link
Contributor

@athanatos

You're right essentially it is an OSD problem since the root cause is an OSDMap encoder, I'll change the PR description.

I was just talking about the Monitors being in charge of applying the increments when there is an OSDMap change, the problem happens once the peon (running in a different version) fetches the increment from MonDB that was encoded with a different encoder, so once the peon applies the increment to it's OSDMap, it gives a different CRC output triggering the check.

The OSDs also apply incrementals -- most of the time that's how they receive a new map. If you think the bug is related to different monitors applying the incrementals differently, please add evidence for it to the bug.

@kamoltat
Copy link
Member Author

kamoltat commented Feb 5, 2024

@kamoltat
Copy link
Member Author

kamoltat commented Feb 7, 2024

@athanatos
You're right essentially it is an OSD problem since the root cause is an OSDMap encoder, I'll change the PR description.
I was just talking about the Monitors being in charge of applying the increments when there is an OSDMap change, the problem happens once the peon (running in a different version) fetches the increment from MonDB that was encoded with a different encoder, so once the peon applies the increment to it's OSDMap, it gives a different CRC output triggering the check.

The OSDs also apply incrementals -- most of the time that's how they receive a new map. If you think the bug is related to different monitors applying the incrementals differently, please add evidence for it to the bug.

Ah yes, so here is an example of where OSDMonitor paxos service is pulling osdmap from mon_store and uses the encoding features:

ceph/src/mon/OSDMonitor.cc

Lines 696 to 700 in a3bdffb

uint64_t features = newmap.get_encoding_features();
newmap.encode(pending_inc.fullmap,
features | CEPH_FEATURE_RESERVED);
pending_inc.full_crc = newmap.get_crc();
dout(20) << " full crc " << pending_inc.full_crc << dendl;

Which then gives the error:
https://pulpito.ceph.com/ksirivad-2024-01-31_21:14:43-upgrade:quincy-x-main-distro-default-smithi/7541233/

2024-01-31T21:48:31.745+0000 7f9405310700 -1 mon.c@2(peon).osd e260 update_from_paxos full map CRC mismatch, resetting to canonical
…
2024-01-31T21:48:31.745+0000 7f9405310700 20 mon.c@2(peon).osd e260 update_from_paxos my (bad) full osdmap:
…

I'll also add this to tracker for reference

@amathuria
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants