Skip to content

src/osd/scrubber: process MapsCompared event syncronously#51498

Merged
ljflores merged 1 commit intoceph:mainfrom
athanatos:sjust/wip-scrub-59049
May 22, 2023
Merged

src/osd/scrubber: process MapsCompared event syncronously#51498
ljflores merged 1 commit intoceph:mainfrom
athanatos:sjust/wip-scrub-59049

Conversation

@athanatos
Copy link
Contributor

@athanatos athanatos commented May 15, 2023

Summary:

WaitReplica submits two asyncronous events through the OSD scheduler concurrently: DigestUpdate and MapsCompared. This causes the error described in https://tracker.ceph.com/issues/59049 because MapsCompared must be processed first, but they may be scheduled with differing priorities depending on whether there are writes blocked. This patch avoids the problem by simply processing MapsCompared syncronously.

Details:

Within the WaitReplicas state handler for GotReplicas, we invoke PGScrubber::maps_compare_n_cleanup:

sc::result WaitReplicas::react(const GotReplicas&) {
  ...
      // maps_compare_n_cleanup() will arrange for MapsCompared event to be
      // sent:
      scrbr->maps_compare_n_cleanup();
      return discard_event();
void PgScrubber::maps_compare_n_cleanup()
{
  ...
  auto required_fixes =
    m_be->scrub_compare_maps(
      m_end.is_max(),
      m_listener->sl_get_snap_map_reader());
  ...
  m_osds->queue_scrub_maps_compared(m_pg, Scrub::scrub_prio_t::low_priority);

PgScrubber::maps_compare_n_cleanup arranges for the MapsCompared event to be submitted through the OSD schdeduler to be delivered to the scrub state machine later. It also invokes scrub_compare_maps

objs_fix_list_t ScrubBackend::scrub_compare_maps(
  bool max_reached,
  SnapMapReaderI& snaps_getter)
{
  ...
  scrub_snapshot_metadata(for_meta_scrub);

which in turn invokes scrub_snapshot_metadata

void ScrubBackend::scrub_snapshot_metadata(ScrubMap& map) {
  ...
  m_scrubber.submit_digest_fixes(this_chunk->missing_digest);

which in turn invokes submit_digest_fixes

void PrimaryLogScrub::submit_digest_fixes(const digests_fixes_t& fixes) {
  ...
  for (auto& [obj, dgs] : fixes) {
    ...
    PrimaryLogPG::OpContextUPtr ctx = m_pl_pg->simple_opc_create(obc);
    ...
    m_pl_pg->finish_ctx(ctx.get(), pg_log_entry_t::MODIFY);
    ...
    ctx->register_on_success([this]() {
      if ((num_digest_updates_pending >= 1) &&
	  (--num_digest_updates_pending == 0)) {
	m_osds->queue_scrub_digest_update(m_pl_pg,
					  m_pl_pg->is_scrub_blocking_ops());
      }
    });

    m_pl_pg->simple_opc_submit(std::move(ctx));
  }
}

which submits a sequence of repops updating objects' digests and schduling the DigestUpdate event once they have completed.

These two events, DigestUpdate and MapsCompared, are therefore in flight concurrently. However, only one event ordering is actually tolerated -- we must process MapsCompared first in order to be in state WaitDigestUpdate when the DigestUpdate event arrives. This would be fine except that the priority at which we submit the DigestUpdate event may end up with a higher priority if there is currently an op blocked.

There are two ways we could solve this. We could ensure that the two are always queued at the same priority and therefore will be processed in order. However, there's actually no reason to process the MapsCompared event asyncronously -- it would be far less brittle to simply process the MapsCompared syncronously and avoid this problem entirely.

Fixes: https://tracker.ceph.com/issues/59049

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@athanatos athanatos requested a review from a team as a code owner May 15, 2023 20:04
@github-actions github-actions bot added the core label May 15, 2023
// maps_compare_n_cleanup() will arrange for MapsCompared event to be
// sent:
scrbr->maps_compare_n_cleanup();
post_event(MapsCompared{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply transition here to WaitDigestUpdate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point. Updating

Summary:

WaitReplica submits two asyncronous events through the OSD scheduler
concurrently: DigestUpdate and MapsCompared.  This causes the error
described in https://tracker.ceph.com/issues/59049 because MapsCompared
must be processed first, but they may be scheduled with differing
priorities depending on whether there are writes blocked.  This patch
avoids the problem by simply processing MapsCompared syncronously.

Details:

Within the WaitReplicas state handler for GotReplicas, we invoke
PGScrubber::maps_compare_n_cleanup:

sc::result WaitReplicas::react(const GotReplicas&)
{
  ...
      // maps_compare_n_cleanup() will arrange for MapsCompared event to be
      // sent:
      scrbr->maps_compare_n_cleanup();
      return discard_event();

void PgScrubber::maps_compare_n_cleanup()
{
  ...
  auto required_fixes =
    m_be->scrub_compare_maps(
      m_end.is_max(),
      m_listener->sl_get_snap_map_reader());
  ...
  m_osds->queue_scrub_maps_compared(m_pg, Scrub::scrub_prio_t::low_priority);

PgScrubber::maps_compare_n_cleanup arranges for the MapsCompared event
to be submitted through the OSD schdeduler to be delivered to the scrub
state machine later.  It also invokes scrub_compare_maps

objs_fix_list_t ScrubBackend::scrub_compare_maps(
  bool max_reached,
  SnapMapReaderI& snaps_getter)
{
  ...
  scrub_snapshot_metadata(for_meta_scrub);

which in turn invokes scrub_snapshot_metadata

void ScrubBackend::scrub_snapshot_metadata(ScrubMap& map)
{
  ...
  m_scrubber.submit_digest_fixes(this_chunk->missing_digest);

which in turn invokes submit_digest_fixes

void PrimaryLogScrub::submit_digest_fixes(const digests_fixes_t& fixes)
{
  ...
  for (auto& [obj, dgs] : fixes) {
    ...
    PrimaryLogPG::OpContextUPtr ctx = m_pl_pg->simple_opc_create(obc);
    ...
    m_pl_pg->finish_ctx(ctx.get(), pg_log_entry_t::MODIFY);
    ...
    ctx->register_on_success([this]() {
      if ((num_digest_updates_pending >= 1) &&
	  (--num_digest_updates_pending == 0)) {
	m_osds->queue_scrub_digest_update(m_pl_pg,
					  m_pl_pg->is_scrub_blocking_ops());
      }
    });

    m_pl_pg->simple_opc_submit(std::move(ctx));
  }
}

which submits a sequence of repops updating objects' digests and
schduling the DigestUpdate event once they have completed.

These two events, DigestUpdate and MapsCompared, are therefore in flight
concurrently.  However, only one event ordering is actually tolerated --
we must process MapsCompared first in order to be in state
WaitDigestUpdate when the DigestUpdate event arrives.  This would be
fine except that the priority at which we submit the DigestUpdate event
may end up with a higher priority if there is currently an op blocked.

There are two ways we could solve this.  We could ensure that the two
are always queued at the same priority and therefore will be processed
in order.  However, there's actually no reason to process the
MapsCompared event asyncronously -- it would be far less brittle to
simply process the MapsCompared syncronously and avoid this problem
entirely.

Fixes: https://tracker.ceph.com/issues/59049
Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-scrub-59049 branch from da0f30d to 604860e Compare May 16, 2023 14:41
@athanatos
Copy link
Contributor Author

Ran a short suite on the second version: https://pulpito.ceph.com/sjust-2023-05-16_18:04:35-rados-wip-sjust-testing-2023-05-16-distro-default-smithi/, seems ok. Will wait for full suite run

@athanatos
Copy link
Contributor Author

jenkins test docs
jenkins test windows
jenkins test make check arm64

@athanatos
Copy link
Contributor Author

jenkins test windows

@athanatos
Copy link
Contributor Author

jenkins test make check arm64

@athanatos
Copy link
Contributor Author

jenkins render docs

@ljflores
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants