Skip to content

Reread sync marker and objv after acquiring the lease#48397

Closed
soumyakoduri wants to merge 2 commits intoceph:mainfrom
soumyakoduri:rgw-cls-version-sync-shard
Closed

Reread sync marker and objv after acquiring the lease#48397
soumyakoduri wants to merge 2 commits intoceph:mainfrom
soumyakoduri:rgw-cls-version-sync-shard

Conversation

@soumyakoduri
Copy link
Contributor

@soumyakoduri soumyakoduri commented Oct 7, 2022

In RGWDataSyncShardCR, after acquiring the lease, reread sync status shard object to fetch the latest marker & objv stored.

This fix is on top of PR##47682

Signed-off-by: Soumya Koduri skoduri@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

…ject and store it in a vector

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
@github-actions github-actions bot added the rgw label Oct 7, 2022
@soumyakoduri soumyakoduri force-pushed the rgw-cls-version-sync-shard branch 2 times, most recently from 18c0744 to e778c6a Compare October 7, 2022 19:13
@smanjara smanjara self-requested a review October 7, 2022 20:46
/* Reread data sync status to fech latest marker and objv */
yield call(new RGWSimpleRadosReadCR<rgw_data_sync_marker>(sync_env->dpp, sync_env->async_rados, sync_env->svc->sysobj,
rgw_raw_obj(pool, status_oid),
&sync_marker, true, &objv));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to check for error with a block like this:

	if (retcode < 0) {
	  lease_cr->go_down();
	  drain_all();
	  return set_cr_error(retcode);
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes addressed it.. thanks!
Also observed in the tests that by reading into sync_marker directly here, we loose the in-memory sync_marker updated at the end of the do{} loop in the incremental_sync(). So the rgw server is repeatedly trying to read remote data log from this sync_marker in an endless loop. So I am planning to check version before updating sync_marker.. Does it seem right?

  tmp_objv.clear();
  /* Reread data sync status to fech the latest marker and update objv */


  yield call(new RGWSimpleRadosReadCR<rgw_data_sync_marker>(sync_env->dpp, sync_env->store,
                                                         rgw_raw_obj(pool, status_oid),
                                                         &tmp_sync_marker, true, &tmp_objv));
    if (retcode < 0) {
      lease_cr->go_down();
      drain_all();
      return set_cr_error(retcode);
    }
    if (tmp_objv.read_version.ver > objv.read_version.ver) {
      sync_marker = tmp_sync_marker;
      objv = tmp_objv;
    }

<<<<

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also observed in the tests that by reading into sync_marker directly here, we loose the in-memory sync_marker updated at the end of the do{} loop in the incremental_sync().

hmm, i think i see what you mean. RGWDataIncSyncShardCR is using sync_marker.marker as a local variable to track its current position in the listing, separately from RGWDataSyncShardMarkerTrack::sync_marker whose updates are written to the sync status object

if RGWDataIncSyncShardCR bails out with an error, we'd retry this call to RGWDataSyncShardCR with the same sync_marker.marker as before. but we really should start over at the position recorded in our sync status object, so i think this RGWSimpleRadosReadCR is doing the right thing

can you tell why RGWDataIncSyncShardCR is failing? it should run forever unless the lease times out, so i guess that's why? if that's happening consistently, we wouldn't expect to make much progress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the issue here.. objv should be cleared before calling "RGWSimpleRadosReadCR" otherwise it may fail in the cls_version_check() in the read_op. This and retcode error check should fix the loop we had observed in the tests earlier.

In RGWDataSyncShardCR, after acquiring the lease, reread sync status
shard object to fetch the latest marker & objv stored.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
@soumyakoduri soumyakoduri force-pushed the rgw-cls-version-sync-shard branch from e778c6a to 6396c26 Compare October 12, 2022 05:49
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor

cbodley commented Feb 9, 2023

@soumyakoduri @smanjara is this still needed?

@soumyakoduri
Copy link
Contributor Author

@soumyakoduri @smanjara is this still needed?

These changes are already merged as part of #48898.

@soumyakoduri soumyakoduri deleted the rgw-cls-version-sync-shard branch March 6, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants