Skip to content

osd: recalculate coll_t::_str during decode() to fix stale values#63938

Merged
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-osd-recalc-decode
Aug 14, 2025
Merged

osd: recalculate coll_t::_str during decode() to fix stale values#63938
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-osd-recalc-decode

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jun 15, 2025

Fix compatibility issue where coll_t::_str retained stale values after
decoding v1/v2 format blobs, causing confusing debug output and tool
messages.

The _str field was not recalculated when decoding older formats:

  • v1 format (pre-Ceph v0.21, before commit tchaikov@a108774)
  • v2 format (meta/regular PGs)
  • v3 format always includes _str (temp PGs)

This primarily affected debugging scenarios since _str is only used for
log messages and BlueStore low-level tool output. The issue went
undetected because existing tests reused struct instances, preserving
field values across encode/decode cycles.

An upcoming test change will allocate fresh instances for each decode, which would expose these stale values. Since _str can be derived from always-encoded fields, recalculate it during decode() when missing from the encoded data.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@tchaikov
Copy link
Contributor Author

@ceph/core hi core maintainers , could you please help review this change?

3 similar comments
@tchaikov
Copy link
Contributor Author

@ceph/core hi core maintainers , could you please help review this change?

@tchaikov
Copy link
Contributor Author

@ceph/core hi core maintainers , could you please help review this change?

@tchaikov
Copy link
Contributor Author

@ceph/core hi core maintainers , could you please help review this change?

@gregsfortytwo
Copy link
Member

This sounds really bad. Was it not happening often in practice? (I'm confused because there's no ticket attached to it.) What was the real consequence of these values being uninitialized here?

It looks like this may be a common problem in the compatibility checker code. Is the referenced patch to change to newly-allocated instances available?

Fix compatibility issue where coll_t::_str retained stale values after
decoding v1/v2 format blobs, causing confusing debug output and tool
messages.

The _str field was not recalculated when decoding older formats:
- v1 format (pre-Ceph v0.21, before commit a108774)
- v2 format (meta/regular PGs)
- v3 format always includes _str (temp PGs)

This primarily affected debugging scenarios since _str is only used for
log messages and BlueStore low-level tool output. The issue went
undetected because existing tests reused struct instances, preserving
field values across encode/decode cycles.

An upcoming test change will allocate fresh instances for each decode,
which would expose these stale values. Since _str can be derived from
always-encoded fields, recalculate it during decode() when missing
from the encoded data.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov tchaikov force-pushed the wip-osd-recalc-decode branch from 050d02d to d01003e Compare July 21, 2025 02:55
@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 21, 2025

@gregsfortytwo Hi Greg, thanks for reviewing this change. Instead of recalculating the non-persisted field in AuthTicket, I dropped it entirely. The change was extracted into #64591.

This sounds really bad. Was it not happening often in practice?

It does happen in practice, but it doesn't cause functional issues if we don't rely on comparing logging messages with actual state. See #64591 for more details on the impact of the uninitialized AuthTicket::renew_after.

It looks like this may be a common problem in the compatibility checker code. Is the referenced patch to change to newly-allocated instances available?

Yes, it's available in #63910.

@tchaikov tchaikov changed the title auth,osd: recalculate fields in decode() osd: recalculate coll_t::_str during decode() to fix stale values Jul 21, 2025
@tchaikov
Copy link
Contributor Author

jenkins test make check

@tchaikov
Copy link
Contributor Author

@ceph/core hi core maintainers , could you please help review this change?

@ljflores
Copy link
Member

ljflores commented Aug 1, 2025

We found an issue with another PR in the batch. I'm having @SrinivasaBharath rebuild and retest to unblock this one. Please stand by...

Testing ref: https://tracker.ceph.com/issues/72299

@ljflores
Copy link
Member

@tchaikov tchaikov merged commit 9f1e680 into ceph:main Aug 14, 2025
19 checks passed
@tchaikov tchaikov deleted the wip-osd-recalc-decode branch August 14, 2025 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants