Skip to content

crimson/osd: clear peer_missing entries for deleted objects#67488

Open
shraddhaag wants to merge 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-peering-bug
Open

crimson/osd: clear peer_missing entries for deleted objects#67488
shraddhaag wants to merge 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-peering-bug

Conversation

@shraddhaag
Copy link
Contributor

@shraddhaag shraddhaag commented Feb 24, 2026

The Issue

This issue occurs when backfilling for an object and a client request to delete for the same object coincide. In such a case, there is a race to acquire the OBC lock between backfill and client delete.

Consider the scenario when the backfill is in progress and has already added the peer_missing entry for the given object. Now, one of two scenarios can happen:

When the lock is acquired first by the backfill, the object is recovered first, and then deleted by the client delete request. When recovering the object, the corresponding peer_missing entry is cleared when backfill recovers the object, the client delete sends a complete delete op to the backfilling OSD (hence, deleting the object) and we are able to transition to Recovered state successfully.

When the lock is acquired first by client delete request, the object is deleted. Then backfill tries to recover the object, finds it deleted and exits early. The stale peer_missing entry is not cleared. In Recovered::Recovered, needs_recovery() sees this stale peer_missing entry and calls abort.

The below image summaries the issue seen in https://pulpito.ceph.com/shraddhaag-2026-02-09_14:34:53-crimson-rados-main-distro-debug-trial/41478/.

Screenshot 2026-02-26 at 7 10 14 PM

The fix.

This issue is fixed by clearing out the stale peer_missing entries for the deleted object when object is discovered to be deleted by recover_object().

The fix was tested across many runs, all successfully passed:

Earlier, an alternative fix was to clear the stale peer_missing entries in enqueue_delete_for_backfill called by the client delete. This approach was abandoned as this can also lead to a race condition which might result in this same error. In such a case, the backfill would complete before the entries are cleared in enqueue_delete_for_backfill().

Fixes: https://tracker.ceph.com/issues/70501

QA Analysis

https://tracker.ceph.com/projects/crimson/wiki/MAIN#httpspulpitocephcomshraddhaag-2026-02-26_022256-crimson-rados-main-distro-debug-trial

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

This commit fixes the abort in Recovered::Recovered.

There is a race to acquire the OBC lock between backfill and
client delete for the same object.

When the lock is acquired first by the backfill, the object is
recovered first, and then deleted by the client delete request.
When recovering the object, the corresponding peer_missing entry
is cleared and we are able to transition to Recovered state
successfully.

When the lock is acquired first by client delete request, the
object is deleted. Then backfill tries to recover the object,
finds it deleted and exists early. The stale peer_missing
entry is not cleared. In Recovered::Recovered, needs_recovery()
sees this stale peer_missing entry and calls abort.

This issue is fixed by clearing out the stale peer_missing
entries for the deleted object when object is being recovered
by recover_object().

Earlier, an alternative fix was to clear the stale
peer_missing entries in enqueue_delete_for_backfill called by
the client delete. This approach was abandoned as this can also
lead to a race condition which might result in this same error,
if the bckfill completes before the entries are cleared in
enqueue_delete_for_backfill().

Fixes: https://tracker.ceph.com/issues/70501
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-peering-bug branch from 3c87d3d to abffb61 Compare February 26, 2026 02:06
@shraddhaag shraddhaag marked this pull request as ready for review February 26, 2026 13:51
@shraddhaag shraddhaag moved this to Awaits review in Crimson Feb 26, 2026
@shraddhaag shraddhaag requested a review from Matan-B February 26, 2026 13:51
@shraddhaag shraddhaag changed the title crimson/osd: clear peer_missing entried for deleted objects crimson/osd: clear peer_missing entries for deleted objects Feb 26, 2026
@Matan-B Matan-B requested review from a team and amathuria February 26, 2026 14:44
Copy link
Contributor

@amathuria amathuria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, approving.
Thanks for the detailed explanation @shraddhaag!

@shraddhaag
Copy link
Contributor Author

jenkins test make check arm64

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress!
I think this approach hides the issue but we should instead make sure racing deletes are properly handled.
We can't skip recovery each time object is not found - there could be cases where the object is not found due to actual error and was not deleted.

// acquired the OBC lock. Clear stale peer_missing entries that were added
// by prepare_backfill_for_missing() so that needs_recovery() does not
// fail the assertion in Recovered::Recovered.
DEBUGDPP("object does not exist, clearing peer_missing: {}", pg, soid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider the case where the object is being recovered and can't (for some catastrophic reason) be found without being deleted.
This must result in an error (abort/retry). Otherwise, we always let missing objects be marked as recovered - even on non-deleted objects.

if (pg.get_peering_state().get_peer_missing(peer).is_missing(soid, &item)) {
DEBUGDPP("clear stale peer_missing entry {} v {} for peer {}",
pg, soid, item.need, peer);
pg.get_peering_state().on_peer_recover(peer, soid, item.need);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplicatedRecoveryBackend::recover_delete already pushes MOSDPGRecoveryDelete to all peers that still have the object marked as missing.

The above would be responsible for letting peer handle call on_local_recover with is_delete=true and handling the reply from this peer (via handle_recovery_delete_reply) and eventually call on_peer_recover.

I think it's wrong to skip the logic above, by calling on_peer_recover here. We trick peering_state into thinking "That peer recovered this object" - while this is not the case.

The issue here (prior to the changes in the PR) seems to be that we thought that we "Abort the recovery in this case .."(l. 46) but we didn't actually abort it and later on PeeringState "Recovered" would fail on ceph_assert(!ps->needs_recovery()); - since we some peers still think they have missing objects.


I would try to understand how we can let MOSDPGRecoveryDelete logic execute properly and/or dropping/aborting recovery for this object.
Alternatively, we could try making maybe_pull_missing_obj smarter and by not pulling deleted objects? Currently we only skip pulls if the object is not missing, perhaps this case could be further expanded to detect deletes.

ReplicatedRecoveryBackend::maybe_pull_missing_obj(
{
  if (!local_missing.is_missing(soid)) {
    // object is not missing, don't pull
    return seastar::make_ready_future<>();
  }

@Matan-B Matan-B moved this from Awaits review to In Progress in Crimson Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants