Skip to content

crimson: take obc lock during push commit on primary#61561

Merged
athanatos merged 14 commits intoceph:mainfrom
athanatos:sjust/wip-crimson-recovery-69412
Jan 31, 2025
Merged

crimson: take obc lock during push commit on primary#61561
athanatos merged 14 commits intoceph:mainfrom
athanatos:sjust/wip-crimson-recovery-69412

Conversation

@athanatos
Copy link
Contributor

See https://tracker.ceph.com/issues/69412

Follow up PR will address replica side.

Test run looks good, failures listed below:
https://pulpito.ceph.com/sjust-2025-01-28_06:26:29-crimson-rados-wip-sjust-crimson-testing-2025-01-27-distro-default-smithi/

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Should make it a bit easier to find the two kinds of
message handlers.

Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos requested a review from a team as a code owner January 29, 2025 00:15
@athanatos athanatos requested a review from Matan-B January 29, 2025 00:15
…ome formatting changes

Signed-off-by: Samuel Just <sjust@redhat.com>
…ing changes

Signed-off-by: Samuel Just <sjust@redhat.com>
…coroutine

Signed-off-by: Samuel Just <sjust@redhat.com>
…coroutine

Signed-off-by: Samuel Just <sjust@redhat.com>
…e to coroutine

Signed-off-by: Samuel Just <sjust@redhat.com>
… to coroutine

Signed-off-by: Samuel Just <sjust@redhat.com>
…object if complete

Signed-off-by: Samuel Just <sjust@redhat.com>
…le_pull_response

Signed-off-by: Samuel Just <sjust@redhat.com>
Avoids extra lookup.

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Already checked in handle_recovery_op.

Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-crimson-recovery-69412 branch from 1a68b82 to 4e70fbb Compare January 29, 2025 05:07
@athanatos
Copy link
Contributor Author

jenkins test make check

Comment on lines -1033 to -1036
if (pg.can_discard_replica_op(*m)) {
DEBUGDPP("discarding {}", pg, *m);
return seastar::now();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same can be applied for pull/pull_response, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, but it can be a future PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #61536

Comment on lines +1348 to +1356
if (pg.is_primary()) {
return handle_pull_response(
boost::static_pointer_cast<MOSDPGPush>(m));
} else {
return handle_push(boost::static_pointer_cast<MOSDPGPush>(m));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

crimson/.../replicated_recovery_backend: route pull_response earlier

Let ReplicatedRecoveryBackend::handle_recovery_op route
pull_responses instead of ReplicatedRecoveryBackend::handle_push.

pull_info.stat.num_objects_recovered++;
auto manager = pg.obc_loader.get_obc_manager(
recovery_waiter.obc);
manager.lock_excl_sync(); /* cannot already be locked */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This obc cannot already be locked since we've just recovered it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we just recovered it so it can't be locked.

seastar::make_ready_future<>());
if (pull_info.recovery_progress.first) {
prepare_waiter = pg.obc_loader.with_obc<RWState::RWNONE>(
auto fut = pg.obc_loader.with_obc<RWState::RWNONE>(
Copy link
Contributor

@Matan-B Matan-B Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this bit to the manager as well? (creation of the WaitForObjectRecovery::obc that will be the one taking excl lock later on)

    auto obc_orderer = pg.obc_loader.get_obc_orderer(pull_info.recovery_info.soid);
    auto obc_manager = pg.obc_loader.get_obc_manager(
      obc_orderer,
      pull_info.recovery_info.soid);
    co_await pg.obc_loader.load_and_lock(
      obc_manager, RWState::RWNONE
    ).handle_error_interruptible(
      crimson::ct_error::assert_all("unexpected error")
    );

    // TODO: move to ObjectContextLoader once constructing obc from attrset is supported
    obc_manager.get_obc()->obs.oi.decode_no_oid(push_op.attrset.at(OI_ATTR),
    ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No lock has to be taken here since we are not submitting the transaction yet and its the phase where we only create this obc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, but it can be a future PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #61536

@Matan-B
Copy link
Contributor

Matan-B commented Jan 30, 2025

@athanatos, since this was tested I'm ok with merging as is and follow up on some of the comments later on.
Let me know what do you think.

Let ReplicatedRecoveryBackend::handle_recovery_op route pushes
between handle_push and handle_pull_response instead of
ReplicatedRecoveryBackend::handle_push.

Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-crimson-recovery-69412 branch from 4e70fbb to 6093f91 Compare January 30, 2025 22:53
@athanatos
Copy link
Contributor Author

jenkins test make check

@athanatos athanatos merged commit 44b51db into ceph:main Jan 31, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Merged (Pre Tentacle Freeze)

Development

Successfully merging this pull request may close these issues.

2 participants