Skip to content

common/frag: properly convert frag_t to net/store endianness#66199

Merged
vshankar merged 7 commits intoceph:mainfrom
batrick:i73792
Dec 5, 2025
Merged

common/frag: properly convert frag_t to net/store endianness#66199
vshankar merged 7 commits intoceph:mainfrom
batrick:i73792

Conversation

@batrick
Copy link
Member

@batrick batrick commented Nov 11, 2025

The MDS/client are already accidentally doing the right thing unless they are running on a big-endian machine.

Fixes: https://tracker.ceph.com/issues/73792

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@batrick batrick requested a review from a team November 11, 2025 15:19
@batrick batrick added cephfs Ceph File System needs-review labels Nov 11, 2025
@batrick batrick requested a review from vshankar November 11, 2025 15:19
@vshankar
Copy link
Contributor

The MDS/client are already accidentally doing the right thing unless they are running on a big-endian machine.

OK, so just for clarity -- this isn't the issue that causes BE kclient hosts to loop on readdir, right?

Its a fix for MDSs running on BE architectures.

@batrick
Copy link
Member Author

batrick commented Nov 11, 2025

The MDS/client are already accidentally doing the right thing unless they are running on a big-endian machine.

OK, so just for clarity -- this isn't the issue that causes BE kclient hosts to loop on readdir, right?

Correct, I do not believe at this time that this is the cause.

Its a fix for MDSs running on BE architectures.

Yes.

@batrick
Copy link
Member Author

batrick commented Nov 13, 2025

This patch can be used for some basic validation:

diff --git a/src/common/frag.cc b/src/common/frag.cc
index 90322c633879..fd92716ea220 100644
--- a/src/common/frag.cc
+++ b/src/common/frag.cc
@@ -14,6 +14,7 @@
  */
 
 #include "include/frag.h"
+#include "include/random.h"
 #include "include/types.h" // for operator<<(std::set)
 #include "common/debug.h"
 #include "common/Formatter.h"
@@ -36,7 +37,12 @@ bool frag_t::parse(const char *s) {
 }
 
 void frag_t::encode(ceph::buffer::list& bl) const {
-  ceph::encode(_enc, bl);
+  auto r = ceph::util::generate_random_number<uint64_t>(0, 10);
+  if ((r % 10) == 1) {
+    ceph::encode(boost::endian::endian_reverse(_enc), bl);
+  } else {
+    ceph::encode(_enc, bl);
+  }
 }
 
 void frag_t::decode(ceph::buffer::list::const_iterator& p) {

which lets us see:

correcting byte swapped frag_t(0x00006b09) to frag_t(0x096b0000) aka 011010110*
correcting byte swapped frag_t(0x00806d09) to frag_t(0x096d8000) aka 011011011*
correcting byte swapped frag_t(0x00006e09) to frag_t(0x096e0000) aka 011011100*
correcting byte swapped frag_t(0x00007209) to frag_t(0x09720000) aka 011100100*
correcting byte swapped frag_t(0x00007a09) to frag_t(0x097a0000) aka 011110100*
correcting byte swapped frag_t(0x00807f09) to frag_t(0x097f8000) aka 011111111*
correcting byte swapped frag_t(0x00808609) to frag_t(0x09868000) aka 100001101*
correcting byte swapped frag_t(0x00008909) to frag_t(0x09890000) aka 100010010*
correcting byte swapped frag_t(0x00809009) to frag_t(0x09908000) aka 100100001*
correcting byte swapped frag_t(0x00809409) to frag_t(0x09948000) aka 100101001*
correcting byte swapped frag_t(0x00809b09) to frag_t(0x099b8000) aka 100110111*
correcting byte swapped frag_t(0x00809e09) to frag_t(0x099e8000) aka 100111101*
correcting byte swapped frag_t(0x0080a509) to frag_t(0x09a58000) aka 101001011*
correcting byte swapped frag_t(0x0000a909) to frag_t(0x09a90000) aka 101010010*
correcting byte swapped frag_t(0x0080b209) to frag_t(0x09b28000) aka 101100101*
correcting byte swapped frag_t(0x0000b309) to frag_t(0x09b30000) aka 101100110*
correcting byte swapped frag_t(0x0000be09) to frag_t(0x09be0000) aka 101111100*
correcting byte swapped frag_t(0x0080ce09) to frag_t(0x09ce8000) aka 110011101*
correcting byte swapped frag_t(0x0080cf09) to frag_t(0x09cf8000) aka 110011111*
correcting byte swapped frag_t(0x0000d709) to frag_t(0x09d70000) aka 110101110*
correcting byte swapped frag_t(0x0000e109) to frag_t(0x09e10000) aka 111000010*

Unfortunately this needs printed to std:cerr. Perhaps we'll want to comment that out for the final merge.

@gregsfortytwo
Copy link
Member

Did you audit our other uses of encode_raw?
Anything in history indicating how this one got set to be wrong?

@batrick
Copy link
Member Author

batrick commented Nov 14, 2025

Did you audit our other uses of encode_raw?

Yes, there are others. Notably mds_gid_t which maybe affects the MDSMap. Leonid added that in 205fd33 but presumably that was for the quiesce DB. I consider it less urgent but obviously it needs fixed. It may require a feature bit.

Anything in history indicating how this one got set to be wrong?

It has been that way since the beginning of the git history (~2007).

@vshankar
Copy link
Contributor

@batrick - not sure about this , so asking -- do we need to do a PSA to our community users regarding this issue?

@batrick
Copy link
Member Author

batrick commented Nov 17, 2025

@batrick - not sure about this , so asking -- do we need to do a PSA to our community users regarding this issue?

Can't hurt. I'll write one up.

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/73938.

@vshankar
Copy link
Contributor

vshankar commented Dec 1, 2025

Need to retest with https://tracker.ceph.com/issues/74026

@batrick batrick force-pushed the i73792 branch 2 times, most recently from e6deca3 to b1bdc63 Compare December 2, 2025 02:13
@github-actions github-actions bot added the tests label Dec 2, 2025
vshankar added a commit to vshankar/ceph that referenced this pull request Dec 2, 2025
* refs/pull/66199/head:

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Of particular interest is the CPU architecture and endianness.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
The MDS/client are already accidentally doing the right thing unless
they are running on a big-endian machine.

Credit to Venky Shankar for originally hypothesizing an endianness issue
with the frag_t.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
There's better tooling for this now and we can avoid magic numbers.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
If a big-endian MDS writes frag_t values into the metadata pool, these
will persist and confuse the MDS after it tries properly parsing them as
little-endian. Fortunately detecting this situation is fairly easy as we
restrict the number of bits and the number of bits restricts the mask
value.

Fixes: https://tracker.ceph.com/issues/73792
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
@batrick
Copy link
Member Author

batrick commented Dec 2, 2025

jenkins test api

@ceph ceph deleted a comment from vshankar Dec 2, 2025
@batrick
Copy link
Member Author

batrick commented Dec 2, 2025

@vshankar I've resolved the make check failures. The generated test instances were invalid frag_t values which confused the encode/decode routines.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit 91acc5f into ceph:main Dec 5, 2025
13 checks passed
@batrick batrick deleted the i73792 branch December 5, 2025 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants