rgw multisite: teach 'bucket sync checkpoint' about log generations#41455
Conversation
|
|
||
| static void get_bucket_instance_ids(const RGWBucketInfo& bucket_info, | ||
| int shard_id, | ||
| int num_shards, int shard_id, |
There was a problem hiding this comment.
why do you pass "num_shards" if this could be fetched from "bucket_info"?
There was a problem hiding this comment.
this function was hard-coded to use num_shards from the current_index layout, but this function is called by open_bucket_index() which takes an arbitrary rgw::bucket_index_layout_generation so we need to use the shards from that index layout instead of current_index
this is relevent now because we can request RGWOp_BILog_Info for any log generation; we convert that to an index layout with log_to_index_layout() and pass that to RGWRados::get_bucket_stats() which calls into open_bucket_index()
without this change, we hit an assert in RGWRados::get_bucket_stats() because open_bucket_index() returns a different number of bucket_objs (based on the given index's num_shards) vs bucket_instance_ids (based on current_index num_shards)
There was a problem hiding this comment.
thanks. I should have changed this as part of the bilog pr.
| ldpp_dout(dpp, -1) << "failed to fetch remote log markers: " << cpp_strerror(r) << dendl; | ||
| return r; | ||
| } | ||
| r = markers.from_string(result.max_marker, -1); |
There was a problem hiding this comment.
how is this change related to the generation fix?
you call "from_string" on the result of the function in 2 places. isn't it batter to keep it inside "rgw_read_remote_bilog_info()" ?
There was a problem hiding this comment.
the callers of rgw_read_remote_bilog_info() now need more than just info.max_marker, so i changed it to return rgw_bucket_index_marker_info directly instead of BucketIndexShardsManager. but i can change it to return both so the callers don't have to duplicate the marker parsing
Signed-off-by: Casey Bodley <cbodley@redhat.com>
knock out a TODO that was causing this assertion failure in RGWRados::get_bucket_stats() after a reshard: ceph_assert(headers.size() == bucket_instance_ids.size()); Signed-off-by: Casey Bodley <cbodley@redhat.com>
e40b86a to
079c4de
Compare
|
thanks @yuvalif, updated |
poll on rgw_read_bucket_full_sync_status() until full_status.incremental_gen catches up to the latest_gen we got from rgw_read_remote_bilog_info() Signed-off-by: Casey Bodley <cbodley@redhat.com>
079c4de to
c253bf2
Compare
| if (full_status.incremental_gen > latest_gen) { | ||
| ldpp_dout(dpp, 1) << "bucket sync caught up with source:\n" | ||
| << " local gen: " << full_status.incremental_gen << '\n' | ||
| << " remote gen: " << latest_gen << dendl; | ||
| return 0; |
There was a problem hiding this comment.
added this last bit in case we overshoot the remote's starting generation, so we don't go on and try to compare markers from different generations
radosgw-admin bucket sync checkpointis used by multisite tests to wait for a bucket sync to catch up with a source zone. sync on a resharded bucket is not caught up until the sync status' generation matches the source zone's latest_genChecklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox