osd/osd_types: don't increment snap_seq on removal#54024
Conversation
Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries. Fixes: https://tracker.ceph.com/issues/62983 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries. After this change there shouldn't be any discontinuity in the first place. Signed-off-by: Matan Breizman <mbreizma@redhat.com>
|
Drafted to make sure I didn't miss anything, since this hasn't changed since snapshots was initially introduced. Tests looks ok: |
| removed_snaps.insert(s); | ||
| // try to add in the new seq, just to try to keep the interval_set contiguous | ||
| if (!removed_snaps.contains(get_snap_seq())) { | ||
| removed_snaps.insert(get_snap_seq()); |
There was a problem hiding this comment.
It looks like the newly created snap got immediately removed -- why were we seeing holes?
There was a problem hiding this comment.
pg_pool_t::removed_snaps was used (pre-octopus) only to track the non-pool removed snaps and to signify self-managed snaps pool.
To actually perform the removal OSDMap::Incremental::new_removed_snaps is used.
case POOL_OP_DELETE_UNMANAGED_SNAP:
...
pp.remove_unmanaged_snap(m->snapid, osdmap.require_osd_release < ceph_release_t::octopus);
pending_inc.new_removed_snaps[m->pool].insert(m->snapid);
changed = true;
case POOL_OP_DELETE_SNAP:
snapid_t s = pp.snap_exists(m->name.c_str());
if (s) {
pp.remove_snap(s);
pending_inc.new_removed_snaps[m->pool].insert(s);
changed = true;
}
We no longer maintain pg_pool_t::removed_snaps, the flag FLAG_SELFMANAGED_SNAPS is used instead.
uint64_t pg_pool_t::add_unmanaged_snap(bool preoctopus_compat)
{
if (snap_seq == 0) {
if (preoctopus_compat) {
// kludge for pre-mimic tracking of pool vs selfmanaged snaps. after
// mimic this field is not decoded but our flag is set; pre-mimic, we
// have a non-empty removed_snaps to signifiy a non-pool-snaps pool.
removed_snaps.insert(snapid_t(1));
}
snap_seq = 1;
}
flags |= FLAG_SELFMANAGED_SNAPS;
|
Sorry @Matan-B, I still need to put some time into this. I'll get to it this week. |
|
...I cannot figure out why we ever needed to increment snap_seq. The behavior was introduced with the original introduction of pool snaps in edb2f80: The behavior for unmanaged snaps was introduced with the original commit ac129d7 as well: In v0.21 (after the above two changes), I can't find any evidence that we actually depended on this behavior. OSD::advance_map compares pg_pool_t::get_snap_epoch() to OSDMap::get_epoch() to determine whether there's been an update, it doesn't use the snap seq. Sage's commit d831ab suggests that we do need to do this: but I can't work out why. @gregsfortytwo I haven't been able to work out why this behavior was ever necessary. Do you happen to remember? |
|
For pool snaps, the removal of a snapshot does remove a snapshot from pg_pool_t::snaps and therefore changes the SnapContext returned from get_snap_context(). It's not clear to me why that would actually matter, but perhaps that was the original reason? The above doesn't actually explain unmanaged snapshots, though. The client is responsible for sending the snapc, and we filter the client's snapc regardless of what we see in the pg_pool_t::snap_seq. |
|
It's possible that the incremental wasn't correct in the first place and d831abe was pushed to resolve the discontinuity issue as a symptom (although not the underlying cause). |
|
I didn't get to go as far through this as I wanted, so let me know if you need me to do more follow-up. (Also, conversed with Sam offline). But what I've got is this:
I have validated that as of v0.21, this doesn't obviously break the snap logic in ReplicatedPG::do_op(), which is what I expected to find. I do see a lot of checks of the form This is also well within the range of things that Sage might have just forgotten how he implemented — I very much thought that we had checks where snap_seq needed to be greater than previously for something to detect it had work to do. I didn't get to checking the clients, though. |
|
rados_ioctx_snap_remove (librados.h) and IoCtx::[aio_]selfmanaged_snap_remove (librados.hpp) don't appear to communicate the incremented snap_seq to the library user -- the return is simply an error code with 0 for success. @idryomov I'd therefore assume that rbd doesn't rely on snap_seq being incremented upon self managed snap remove -- can you think of any way in which it does? |
@liewegas Do you happen to remember why we increment snap_seq on removal of a self-managed snapshot? |
|
Oh, and what I actually thought the comparison mattered for was snap trimming, but @athanatos assures me that this is handled by looking at osdmap epochs rather than the snap_seq |
|
@rzarzynski @neha-ojha, @gregsfortytwo and I can't think of a reason why we need to keep this behavior. I think we should probably merge it -- do you disagree? |
I can't. librbd doesn't touch its snap context at all when removing a snapshot. |
I worded that badly. Given this sentence, one can conclude that future writes are sent with removed snapshot ID still in the snapshot context. This is not the case. When removing a snapshot, librbd doesn't touch its snap_seq (the part of snapshot context that is stored in the image header explicitly). The other part, the list of live snapshot IDs, isn't stored explicitly -- instead, it's generated by iterating through snapshot descriptors each time the image header is refreshed (i.e. re-read from RADOS). The snapshot descriptor gets removed from the image header as part of snapshot removal and then the image header is guaranteed to be refreshed before serving any I/O. So the removed snapshot ID is dropped from the list of live snapshot IDs for future writes, just in a lazy fashion. |
@rzarzynski sure, I have asked @yuriw to test it individually. |
|
@athanatos, @rzarzynski - Tests look ok (https://trello.com/c/JdOAfyRy/1913-wip-yuri3-testing-2023-12-19-1211). CC: @yuriw |
Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries.
pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries.
After this change there shouldn't be any discontinuity in the first place.
Fixes: https://tracker.ceph.com/issues/62983
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows