Reread sync marker and objv after acquiring the lease#48397
Reread sync marker and objv after acquiring the lease#48397soumyakoduri wants to merge 2 commits intoceph:mainfrom
Conversation
…ject and store it in a vector Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
18c0744 to
e778c6a
Compare
| /* Reread data sync status to fech latest marker and objv */ | ||
| yield call(new RGWSimpleRadosReadCR<rgw_data_sync_marker>(sync_env->dpp, sync_env->async_rados, sync_env->svc->sysobj, | ||
| rgw_raw_obj(pool, status_oid), | ||
| &sync_marker, true, &objv)); |
There was a problem hiding this comment.
needs to check for error with a block like this:
if (retcode < 0) {
lease_cr->go_down();
drain_all();
return set_cr_error(retcode);
}There was a problem hiding this comment.
yes addressed it.. thanks!
Also observed in the tests that by reading into sync_marker directly here, we loose the in-memory sync_marker updated at the end of the do{} loop in the incremental_sync(). So the rgw server is repeatedly trying to read remote data log from this sync_marker in an endless loop. So I am planning to check version before updating sync_marker.. Does it seem right?
tmp_objv.clear();
/* Reread data sync status to fech the latest marker and update objv */
yield call(new RGWSimpleRadosReadCR<rgw_data_sync_marker>(sync_env->dpp, sync_env->store,
rgw_raw_obj(pool, status_oid),
&tmp_sync_marker, true, &tmp_objv));
if (retcode < 0) {
lease_cr->go_down();
drain_all();
return set_cr_error(retcode);
}
if (tmp_objv.read_version.ver > objv.read_version.ver) {
sync_marker = tmp_sync_marker;
objv = tmp_objv;
}
<<<<
There was a problem hiding this comment.
Also observed in the tests that by reading into sync_marker directly here, we loose the in-memory sync_marker updated at the end of the do{} loop in the incremental_sync().
hmm, i think i see what you mean. RGWDataIncSyncShardCR is using sync_marker.marker as a local variable to track its current position in the listing, separately from RGWDataSyncShardMarkerTrack::sync_marker whose updates are written to the sync status object
if RGWDataIncSyncShardCR bails out with an error, we'd retry this call to RGWDataSyncShardCR with the same sync_marker.marker as before. but we really should start over at the position recorded in our sync status object, so i think this RGWSimpleRadosReadCR is doing the right thing
can you tell why RGWDataIncSyncShardCR is failing? it should run forever unless the lease times out, so i guess that's why? if that's happening consistently, we wouldn't expect to make much progress
There was a problem hiding this comment.
I think I found the issue here.. objv should be cleared before calling "RGWSimpleRadosReadCR" otherwise it may fail in the cls_version_check() in the read_op. This and retcode error check should fix the loop we had observed in the tests earlier.
In RGWDataSyncShardCR, after acquiring the lease, reread sync status shard object to fetch the latest marker & objv stored. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
e778c6a to
6396c26
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@soumyakoduri @smanjara is this still needed? |
These changes are already merged as part of #48898. |
In RGWDataSyncShardCR, after acquiring the lease, reread sync status shard object to fetch the latest marker & objv stored.
This fix is on top of PR##47682
Signed-off-by: Soumya Koduri skoduri@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows