Skip to content

osd/osd_types: don't increment snap_seq on removal#54024

Merged
yuriw merged 2 commits intoceph:mainfrom
Matan-B:wip-matanb-snap_seq_inc
Jan 2, 2024
Merged

osd/osd_types: don't increment snap_seq on removal#54024
yuriw merged 2 commits intoceph:mainfrom
Matan-B:wip-matanb-snap_seq_inc

Conversation

@Matan-B
Copy link
Contributor

@Matan-B Matan-B commented Oct 15, 2023

  • Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries.

  • pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries.
    After this change there shouldn't be any discontinuity in the first place.

Fixes: https://tracker.ceph.com/issues/62983

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Avoid adjacent snap ids removals discontinuity in monitor's purged_snap and OSD's PSN entries.

Fixes: https://tracker.ceph.com/issues/62983

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
pending_pseudo_purged_snaps was introduced in order to avoid discontinuity in the purged_snap monitor entries.
After this change there shouldn't be any discontinuity in the first place.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B
Copy link
Contributor Author

Matan-B commented Oct 15, 2023

Drafted to make sure I didn't miss anything, since this hasn't changed since snapshots was initially introduced.

Tests looks ok:
https://pulpito.ceph.com/matan-2023-10-15_12:46:59-rados:thrash-wip-matanb-snap_seq_inc-distro-default-smithi/

removed_snaps.insert(s);
// try to add in the new seq, just to try to keep the interval_set contiguous
if (!removed_snaps.contains(get_snap_seq())) {
removed_snaps.insert(get_snap_seq());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the newly created snap got immediately removed -- why were we seeing holes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pg_pool_t::removed_snaps was used (pre-octopus) only to track the non-pool removed snaps and to signify self-managed snaps pool.
To actually perform the removal OSDMap::Incremental::new_removed_snaps is used.

  case POOL_OP_DELETE_UNMANAGED_SNAP:
  ...
    pp.remove_unmanaged_snap(m->snapid, osdmap.require_osd_release < ceph_release_t::octopus);
    pending_inc.new_removed_snaps[m->pool].insert(m->snapid);
    changed = true;
  case POOL_OP_DELETE_SNAP:
    snapid_t s = pp.snap_exists(m->name.c_str());
    if (s) {
      pp.remove_snap(s);
      pending_inc.new_removed_snaps[m->pool].insert(s);
      changed = true;
    }

We no longer maintain pg_pool_t::removed_snaps, the flag FLAG_SELFMANAGED_SNAPS is used instead.

uint64_t pg_pool_t::add_unmanaged_snap(bool preoctopus_compat)
{
  if (snap_seq == 0) {
    if (preoctopus_compat) {
      // kludge for pre-mimic tracking of pool vs selfmanaged snaps.  after
      // mimic this field is not decoded but our flag is set; pre-mimic, we
      // have a non-empty removed_snaps to signifiy a non-pool-snaps pool.
      removed_snaps.insert(snapid_t(1));
    }
    snap_seq = 1;
  }
  flags |= FLAG_SELFMANAGED_SNAPS;

@Matan-B Matan-B marked this pull request as ready for review October 18, 2023 15:34
@Matan-B Matan-B requested a review from a team as a code owner October 18, 2023 15:34
@athanatos
Copy link
Contributor

Sorry @Matan-B, I still need to put some time into this. I'll get to it this week.

@athanatos
Copy link
Contributor

...I cannot figure out why we ever needed to increment snap_seq.

The behavior was introduced with the original introduction of pool snaps in edb2f80:

commit edb2f80c25a476b4b0ad8014bbd812cc46d16239
Author: Sage Weil <sage@newdream.net>
Date:   Fri Jun 5 11:52:43 2009 -0700

    osd: add 'osd pool rmsnap poolname snapname' command
...
+  void remove_snap(snapid_t s) {
+    assert(snaps.count(s));
+    snaps.erase(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

The behavior for unmanaged snaps was introduced with the original commit ac129d7 as well:

commit ac129d7895bd0f4e9e3bf606144376be46f706d2
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Thu Apr 8 14:45:57 2010 -0700

    osd: pg_pool_t gets new functions for unmanaged (ie, client-managed) snaps
...
+  void remove_unmanaged_snap(snapid_t s) {
+    assert(snaps.empty());
+    removed_snaps.insert(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

In v0.21 (after the above two changes), I can't find any evidence that we actually depended on this behavior. OSD::advance_map compares pg_pool_t::get_snap_epoch() to OSDMap::get_epoch() to determine whether there's been an update, it doesn't use the snap seq.

Sage's commit d831ab suggests that we do need to do this:

commit d831abeae1688a18eb446dd1a63eb6ed94f45d81
Author: Sage Weil <sweil@redhat.com>
Date:   Thu Jun 20 12:07:38 2019 -0500

    mon/OSDMonitor: record snap removal seq as purged
    
    When we delete a selfmanaged snap we have to bump seq.  Record this as
    purged so that we avoid discontinuities in the history and so our storage
    is a bit more efficient.
    
    Signed-off-by: Sage Weil <sage@redhat.com>
...
+      // also record the new seq as purged: this avoids a discontinuity
+      // after all of the snaps have been purged, since the seq assigned
+      // during removal lives in the same namespace as the actual snaps.
+      pending_pseudo_purged_snaps[m->pool].insert(pp.get_snap_seq());
...

but I can't work out why.

@gregsfortytwo I haven't been able to work out why this behavior was ever necessary. Do you happen to remember?

@athanatos
Copy link
Contributor

For pool snaps, the removal of a snapshot does remove a snapshot from pg_pool_t::snaps and therefore changes the SnapContext returned from get_snap_context(). It's not clear to me why that would actually matter, but perhaps that was the original reason?

The above doesn't actually explain unmanaged snapshots, though. The client is responsible for sending the snapc, and we filter the client's snapc regardless of what we see in the pg_pool_t::snap_seq.

@Matan-B
Copy link
Contributor Author

Matan-B commented Nov 8, 2023

It's possible that the incremental wasn't correct in the first place and d831abe was pushed to resolve the discontinuity issue as a symptom (although not the underlying cause).

@gregsfortytwo
Copy link
Member

I didn't get to go as far through this as I wanted, so let me know if you need me to do more follow-up. (Also, conversed with Sam offline).

But what I've got is this:

  • You don't need to worry about the MDS behavior here. It has always used the MRemoveSnaps message, which isn't impacted by these changes (and which I don't think anything else uses).
  • I certainly thought that the reason we increased snap_seq was because somewhere in the code, it is used in a comparison to determine which SnapSet (or similar) to trust or accept, or to see if a client or OSD needs to do an update. Sam tells me he can't find it, and I haven't either in a short search...
  • There is every chance that I cargo-culted my patch's adjustment to snap_seq from Sage's.

I have validated that as of v0.21, this doesn't obviously break the snap logic in ReplicatedPG::do_op(), which is what I expected to find. I do see a lot of checks of the form if (ssc->snapset.seq > snapc.seq) (where snapc.seq is set from snap_seq), and I wonder if perhaps somewhere between v0.18 and v0.21 the direction of those comparisons changed? Might want to check that.

This is also well within the range of things that Sage might have just forgotten how he implemented — I very much thought that we had checks where snap_seq needed to be greater than previously for something to detect it had work to do. I didn't get to checking the clients, though.

@athanatos
Copy link
Contributor

rados_ioctx_snap_remove (librados.h) and IoCtx::[aio_]selfmanaged_snap_remove (librados.hpp) don't appear to communicate the incremented snap_seq to the library user -- the return is simply an error code with 0 for success. @idryomov I'd therefore assume that rbd doesn't rely on snap_seq being incremented upon self managed snap remove -- can you think of any way in which it does?

@athanatos
Copy link
Contributor

...I cannot figure out why we ever needed to increment snap_seq.

The behavior was introduced with the original introduction of pool snaps in edb2f80:

commit edb2f80c25a476b4b0ad8014bbd812cc46d16239
Author: Sage Weil <sage@newdream.net>
Date:   Fri Jun 5 11:52:43 2009 -0700

    osd: add 'osd pool rmsnap poolname snapname' command
...
+  void remove_snap(snapid_t s) {
+    assert(snaps.count(s));
+    snaps.erase(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

The behavior for unmanaged snaps was introduced with the original commit ac129d7 as well:

commit ac129d7895bd0f4e9e3bf606144376be46f706d2
Author: Greg Farnum <gregf@hq.newdream.net>
Date:   Thu Apr 8 14:45:57 2010 -0700

    osd: pg_pool_t gets new functions for unmanaged (ie, client-managed) snaps
...
+  void remove_unmanaged_snap(snapid_t s) {
+    assert(snaps.empty());
+    removed_snaps.insert(s);
+    v.snap_seq = v.snap_seq + 1;
+  }
...

In v0.21 (after the above two changes), I can't find any evidence that we actually depended on this behavior. OSD::advance_map compares pg_pool_t::get_snap_epoch() to OSDMap::get_epoch() to determine whether there's been an update, it doesn't use the snap seq.

Sage's commit d831ab suggests that we do need to do this:

commit d831abeae1688a18eb446dd1a63eb6ed94f45d81
Author: Sage Weil <sweil@redhat.com>
Date:   Thu Jun 20 12:07:38 2019 -0500

    mon/OSDMonitor: record snap removal seq as purged
    
    When we delete a selfmanaged snap we have to bump seq.  Record this as
    purged so that we avoid discontinuities in the history and so our storage
    is a bit more efficient.
    
    Signed-off-by: Sage Weil <sage@redhat.com>
...
+      // also record the new seq as purged: this avoids a discontinuity
+      // after all of the snaps have been purged, since the seq assigned
+      // during removal lives in the same namespace as the actual snaps.
+      pending_pseudo_purged_snaps[m->pool].insert(pp.get_snap_seq());
...

but I can't work out why.

@gregsfortytwo I haven't been able to work out why this behavior was ever necessary. Do you happen to remember?

@liewegas Do you happen to remember why we increment snap_seq on removal of a self-managed snapshot?

@gregsfortytwo
Copy link
Member

Oh, and what I actually thought the comparison mattered for was snap trimming, but @athanatos assures me that this is handled by looking at osdmap epochs rather than the snap_seq

@athanatos
Copy link
Contributor

athanatos commented Nov 28, 2023

@rzarzynski @neha-ojha, @gregsfortytwo and I can't think of a reason why we need to keep this behavior. I think we should probably merge it -- do you disagree?

@rzarzynski
Copy link
Contributor

Let's give it a dedicated, pre-merge teuthology run.

CC: @yuriw, @ljflores.

@idryomov
Copy link
Contributor

@idryomov I'd therefore assume that rbd doesn't rely on snap_seq being incremented upon self managed snap remove -- can you think of any way in which it does?

I can't. librbd doesn't touch its snap context at all when removing a snapshot.

@idryomov
Copy link
Contributor

librbd doesn't touch its snap context at all when removing a snapshot.

I worded that badly. Given this sentence, one can conclude that future writes are sent with removed snapshot ID still in the snapshot context. This is not the case.

When removing a snapshot, librbd doesn't touch its snap_seq (the part of snapshot context that is stored in the image header explicitly). The other part, the list of live snapshot IDs, isn't stored explicitly -- instead, it's generated by iterating through snapshot descriptors each time the image header is refreshed (i.e. re-read from RADOS). The snapshot descriptor gets removed from the image header as part of snapshot removal and then the image header is guaranteed to be refreshed before serving any I/O. So the removed snapshot ID is dropped from the list of live snapshot IDs for future writes, just in a lazy fashion.

@ljflores
Copy link
Member

Let's give it a dedicated, pre-merge teuthology run.

CC: @yuriw, @ljflores.

@rzarzynski sure, I have asked @yuriw to test it individually.

@Matan-B
Copy link
Contributor Author

Matan-B commented Dec 31, 2023

@athanatos, @rzarzynski - Tests look ok (https://trello.com/c/JdOAfyRy/1913-wip-yuri3-testing-2023-12-19-1211).
Awaiting for approval prior to merging. Thanks!

CC: @yuriw

@yuriw yuriw merged commit ad9b6d7 into ceph:main Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants