osd/osd_types: don't increment snap_seq on removal by Matan-B · Pull Request #54024 · ceph/ceph

Matan-B · 2023-10-15T14:32:21Z

Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries.
pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries.
After this change there shouldn't be any discontinuity in the first place.

Fixes: https://tracker.ceph.com/issues/62983

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries. Fixes: https://tracker.ceph.com/issues/62983 Signed-off-by: Matan Breizman <mbreizma@redhat.com>

pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries. After this change there shouldn't be any discontinuity in the first place. Signed-off-by: Matan Breizman <mbreizma@redhat.com>

Matan-B · 2023-10-15T14:37:21Z

Drafted to make sure I didn't miss anything, since this hasn't changed since snapshots was initially introduced.

Tests looks ok:
https://pulpito.ceph.com/matan-2023-10-15_12:46:59-rados:thrash-wip-matanb-snap_seq_inc-distro-default-smithi/

athanatos · 2023-10-16T03:34:54Z

src/osd/osd_types.cc

    removed_snaps.insert(s);
-    // try to add in the new seq, just to try to keep the interval_set contiguous
-    if (!removed_snaps.contains(get_snap_seq())) {
-      removed_snaps.insert(get_snap_seq());


It looks like the newly created snap got immediately removed -- why were we seeing holes?

pg_pool_t::removed_snaps was used (pre-octopus) only to track the non-pool removed snaps and to signify self-managed snaps pool.
To actually perform the removal OSDMap::Incremental::new_removed_snaps is used.

case POOL_OP_DELETE_UNMANAGED_SNAP: ... pp.remove_unmanaged_snap(m->snapid, osdmap.require_osd_release < ceph_release_t::octopus); pending_inc.new_removed_snaps[m->pool].insert(m->snapid); changed = true;

case POOL_OP_DELETE_SNAP: snapid_t s = pp.snap_exists(m->name.c_str()); if (s) { pp.remove_snap(s); pending_inc.new_removed_snaps[m->pool].insert(s); changed = true; }

We no longer maintain pg_pool_t::removed_snaps, the flag FLAG_SELFMANAGED_SNAPS is used instead.

uint64_t pg_pool_t::add_unmanaged_snap(bool preoctopus_compat) { if (snap_seq == 0) { if (preoctopus_compat) { // kludge for pre-mimic tracking of pool vs selfmanaged snaps. after // mimic this field is not decoded but our flag is set; pre-mimic, we // have a non-empty removed_snaps to signifiy a non-pool-snaps pool. removed_snaps.insert(snapid_t(1)); } snap_seq = 1; } flags |= FLAG_SELFMANAGED_SNAPS;

athanatos · 2023-11-07T03:44:28Z

Sorry @Matan-B, I still need to put some time into this. I'll get to it this week.

athanatos · 2023-11-08T05:30:48Z

...I cannot figure out why we ever needed to increment snap_seq.

The behavior was introduced with the original introduction of pool snaps in edb2f80:

commit edb2f80c25a476b4b0ad8014bbd812cc46d16239
Author: Sage Weil <sage@newdream.net>
Date:   Fri Jun 5 11:52:43 2009 -0700

    osd: add 'osd pool rmsnap poolname snapname' command
...
+  void remove_snap(snapid_t s) {
+    assert(snaps.count(s));
+    snaps.erase(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

The behavior for unmanaged snaps was introduced with the original commit ac129d7 as well:

commit ac129d7895bd0f4e9e3bf606144376be46f706d2
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Thu Apr 8 14:45:57 2010 -0700

    osd: pg_pool_t gets new functions for unmanaged (ie, client-managed) snaps
...
+  void remove_unmanaged_snap(snapid_t s) {
+    assert(snaps.empty());
+    removed_snaps.insert(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

In v0.21 (after the above two changes), I can't find any evidence that we actually depended on this behavior. OSD::advance_map compares pg_pool_t::get_snap_epoch() to OSDMap::get_epoch() to determine whether there's been an update, it doesn't use the snap seq.

Sage's commit d831ab suggests that we do need to do this:

commit d831abeae1688a18eb446dd1a63eb6ed94f45d81
Author: Sage Weil <sweil@redhat.com>
Date:   Thu Jun 20 12:07:38 2019 -0500

    mon/OSDMonitor: record snap removal seq as purged
    
    When we delete a selfmanaged snap we have to bump seq.  Record this as
    purged so that we avoid discontinuities in the history and so our storage
    is a bit more efficient.
    
    Signed-off-by: Sage Weil <sage@redhat.com>
...
+      // also record the new seq as purged: this avoids a discontinuity
+      // after all of the snaps have been purged, since the seq assigned
+      // during removal lives in the same namespace as the actual snaps.
+      pending_pseudo_purged_snaps[m->pool].insert(pp.get_snap_seq());
...

but I can't work out why.

@gregsfortytwo I haven't been able to work out why this behavior was ever necessary. Do you happen to remember?

athanatos · 2023-11-08T06:01:26Z

For pool snaps, the removal of a snapshot does remove a snapshot from pg_pool_t::snaps and therefore changes the SnapContext returned from get_snap_context(). It's not clear to me why that would actually matter, but perhaps that was the original reason?

The above doesn't actually explain unmanaged snapshots, though. The client is responsible for sending the snapc, and we filter the client's snapc regardless of what we see in the pg_pool_t::snap_seq.

Matan-B · 2023-11-08T10:58:27Z

It's possible that the incremental wasn't correct in the first place and d831abe was pushed to resolve the discontinuity issue as a symptom (although not the underlying cause).

gregsfortytwo · 2023-11-15T01:49:07Z

I didn't get to go as far through this as I wanted, so let me know if you need me to do more follow-up. (Also, conversed with Sam offline).

But what I've got is this:

You don't need to worry about the MDS behavior here. It has always used the MRemoveSnaps message, which isn't impacted by these changes (and which I don't think anything else uses).
I certainly thought that the reason we increased snap_seq was because somewhere in the code, it is used in a comparison to determine which SnapSet (or similar) to trust or accept, or to see if a client or OSD needs to do an update. Sam tells me he can't find it, and I haven't either in a short search...
There is every chance that I cargo-culted my patch's adjustment to snap_seq from Sage's.

I have validated that as of v0.21, this doesn't obviously break the snap logic in ReplicatedPG::do_op(), which is what I expected to find. I do see a lot of checks of the form if (ssc->snapset.seq > snapc.seq) (where snapc.seq is set from snap_seq), and I wonder if perhaps somewhere between v0.18 and v0.21 the direction of those comparisons changed? Might want to check that.

This is also well within the range of things that Sage might have just forgotten how he implemented — I very much thought that we had checks where snap_seq needed to be greater than previously for something to detect it had work to do. I didn't get to checking the clients, though.

athanatos · 2023-11-15T02:07:41Z

rados_ioctx_snap_remove (librados.h) and IoCtx::[aio_]selfmanaged_snap_remove (librados.hpp) don't appear to communicate the incremented snap_seq to the library user -- the return is simply an error code with 0 for success. @idryomov I'd therefore assume that rbd doesn't rely on snap_seq being incremented upon self managed snap remove -- can you think of any way in which it does?

athanatos · 2023-11-15T02:12:34Z

...I cannot figure out why we ever needed to increment snap_seq.

The behavior was introduced with the original introduction of pool snaps in edb2f80:
commit edb2f80c25a476b4b0ad8014bbd812cc46d16239
Author: Sage Weil <sage@newdream.net>
Date:   Fri Jun 5 11:52:43 2009 -0700

    osd: add 'osd pool rmsnap poolname snapname' command
...
+  void remove_snap(snapid_t s) {
+    assert(snaps.count(s));
+    snaps.erase(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...
The behavior for unmanaged snaps was introduced with the original commit ac129d7 as well:
commit ac129d7895bd0f4e9e3bf606144376be46f706d2
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Thu Apr 8 14:45:57 2010 -0700

    osd: pg_pool_t gets new functions for unmanaged (ie, client-managed) snaps
...
+  void remove_unmanaged_snap(snapid_t s) {
+    assert(snaps.empty());
+    removed_snaps.insert(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...
In v0.21 (after the above two changes), I can't find any evidence that we actually depended on this behavior. OSD::advance_map compares pg_pool_t::get_snap_epoch() to OSDMap::get_epoch() to determine whether there's been an update, it doesn't use the snap seq.

Sage's commit d831ab suggests that we do need to do this:
commit d831abeae1688a18eb446dd1a63eb6ed94f45d81
Author: Sage Weil <sweil@redhat.com>
Date:   Thu Jun 20 12:07:38 2019 -0500

    mon/OSDMonitor: record snap removal seq as purged
    
    When we delete a selfmanaged snap we have to bump seq.  Record this as
    purged so that we avoid discontinuities in the history and so our storage
    is a bit more efficient.
    
    Signed-off-by: Sage Weil <sage@redhat.com>
...
+      // also record the new seq as purged: this avoids a discontinuity
+      // after all of the snaps have been purged, since the seq assigned
+      // during removal lives in the same namespace as the actual snaps.
+      pending_pseudo_purged_snaps[m->pool].insert(pp.get_snap_seq());
...
but I can't work out why.

@gregsfortytwo I haven't been able to work out why this behavior was ever necessary. Do you happen to remember?

@liewegas Do you happen to remember why we increment snap_seq on removal of a self-managed snapshot?

gregsfortytwo · 2023-11-15T02:21:17Z

Oh, and what I actually thought the comparison mattered for was snap trimming, but @athanatos assures me that this is handled by looking at osdmap epochs rather than the snap_seq

athanatos · 2023-11-28T23:18:27Z

@rzarzynski @neha-ojha, @gregsfortytwo and I can't think of a reason why we need to keep this behavior. I think we should probably merge it -- do you disagree?

rzarzynski · 2023-11-29T21:03:26Z

Let's give it a dedicated, pre-merge teuthology run.

CC: @yuriw, @ljflores.

idryomov · 2023-11-29T22:26:26Z

@idryomov I'd therefore assume that rbd doesn't rely on snap_seq being incremented upon self managed snap remove -- can you think of any way in which it does?

I can't. librbd doesn't touch its snap context at all when removing a snapshot.

idryomov · 2023-11-30T00:12:41Z

librbd doesn't touch its snap context at all when removing a snapshot.

I worded that badly. Given this sentence, one can conclude that future writes are sent with removed snapshot ID still in the snapshot context. This is not the case.

When removing a snapshot, librbd doesn't touch its snap_seq (the part of snapshot context that is stored in the image header explicitly). The other part, the list of live snapshot IDs, isn't stored explicitly -- instead, it's generated by iterating through snapshot descriptors each time the image header is refreshed (i.e. re-read from RADOS). The snapshot descriptor gets removed from the image header as part of snapshot removal and then the image header is guaranteed to be refreshed before serving any I/O. So the removed snapshot ID is dropped from the list of live snapshot IDs for future writes, just in a lazy fashion.

ljflores · 2023-12-19T20:08:42Z

Let's give it a dedicated, pre-merge teuthology run.

CC: @yuriw, @ljflores.

@rzarzynski sure, I have asked @yuriw to test it individually.

Matan-B · 2023-12-31T10:48:17Z

@athanatos, @rzarzynski - Tests look ok (https://trello.com/c/JdOAfyRy/1913-wip-yuri3-testing-2023-12-19-1211).
Awaiting for approval prior to merging. Thanks!

CC: @yuriw

Matan-B added 2 commits October 15, 2023 14:30

osd/osd_types: don't increment snap_seq on removal

7d808bb

Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries. Fixes: https://tracker.ceph.com/issues/62983 Signed-off-by: Matan Breizman <mbreizma@redhat.com>

Matan-B requested review from athanatos and rzarzynski October 15, 2023 14:32

github-actions bot added core mon labels Oct 15, 2023

athanatos reviewed Oct 16, 2023

View reviewed changes

Matan-B requested review from athanatos and gregsfortytwo October 16, 2023 14:58

Matan-B marked this pull request as ready for review October 18, 2023 15:34

Matan-B requested a review from a team as a code owner October 18, 2023 15:34

rzarzynski added the needs-qa label Nov 29, 2023

yuriw added the wip-yuri3-testing label Dec 19, 2023

Matan-B added the TESTED label Dec 31, 2023

athanatos approved these changes Jan 2, 2024

View reviewed changes

yuriw merged commit ad9b6d7 into ceph:main Jan 2, 2024

Conversation

Matan-B commented Oct 15, 2023

Contribution Guidelines

Checklist

Uh oh!

Matan-B commented Oct 15, 2023

Uh oh!

athanatos Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

Matan-B Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

athanatos commented Nov 7, 2023

Uh oh!

athanatos commented Nov 8, 2023

Uh oh!

athanatos commented Nov 8, 2023

Uh oh!

Matan-B commented Nov 8, 2023

Uh oh!

gregsfortytwo commented Nov 15, 2023

Uh oh!

athanatos commented Nov 15, 2023

Uh oh!

athanatos commented Nov 15, 2023

Uh oh!

gregsfortytwo commented Nov 15, 2023

Uh oh!

athanatos commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented Nov 29, 2023

Uh oh!

idryomov commented Nov 29, 2023

Uh oh!

idryomov commented Nov 30, 2023

Uh oh!

ljflores commented Dec 19, 2023

Uh oh!

Matan-B commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

athanatos commented Nov 28, 2023 •

edited

Loading

Matan-B commented Dec 31, 2023 •

edited

Loading