Make SnapshotsInProgress Diffable#89619
Make SnapshotsInProgress Diffable#89619original-brownbear merged 33 commits intoelastic:mainfrom original-brownbear:nicer-sn-in-progress
Conversation
server/src/test/java/org/elasticsearch/snapshots/SnapshotsInProgressSerializationTests.java
Show resolved
Hide resolved
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Not the nicest solution ever to diffing a list that can see both entries change and move about in the list but the best I could come up with without refactoring things in this class.
There was a problem hiding this comment.
Likewise, could we do that prep work first and avoid having to do this? If not, could we at least sprinkle a bunch of assertions around here to validate our assumptions about the diff (no gaps, no duplicates, etc)?
There was a problem hiding this comment.
I think a big refactoring that avoids the current structure of a map of lists will be hard to pull off in the short-term. It's a massive change-set that would be required here and I'd much rather do it in increments.
The beauty of the code here is that at least for the positions we need no such assertions since List.of(...) will be unforgiving to gaps. So if we take the size of the array from the uuid->snaphot map and then use the positions for the iteration we can't really run into trouble ever can we?
There was a problem hiding this comment.
Ok - I'll suggest a couple of extra assertions in some following comments.
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @original-brownbear, I've created a changelog YAML for you. |
|
|
||
| @Override | ||
| public Version getMinimalSupportedVersion() { | ||
| return Version.CURRENT.minimumCompatibilityVersion(); |
There was a problem hiding this comment.
Should it be DIFFABLE_VERSION?
There was a problem hiding this comment.
I don't think so because we were always allegedly "diffable" but used a non-diff. That's why I had to add the BwC new SimpleDiffable.CompleteDiff<>(after).writeTo(out); below. So in a sense this is just a wire format change from the perspective of the checks on this I think.
DaveCTurner
left a comment
There was a problem hiding this comment.
Tests look ok now, I left a couple of other ideas.
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Likewise, could we do that prep work first and avoid having to do this? If not, could we at least sprinkle a bunch of assertions around here to validate our assumptions about the diff (no gaps, no duplicates, etc)?
|
@DaveCTurner ping in case you have a second, this would be quite nice to have in for our snapshot benchmarking. It's a surprisingly large speedup :) |
DaveCTurner
left a comment
There was a problem hiding this comment.
A few more small comments
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| EntryDiff(Entry before, Entry after) { |
There was a problem hiding this comment.
I think we should be explicit about the fields that must not change between before and after here, at least asserting that they're the same but maybe also protecting against that kind of bug in production too.
There was a problem hiding this comment.
Makes sense, I added actual exception throwing in production. If we ever run into a bug here, it's still better to stall everything and (maybe) for a full cluster restart to fix things than to corrupt the repo IMO :)
server/src/test/java/org/elasticsearch/snapshots/SnapshotsInProgressSerializationTests.java
Outdated
Show resolved
Hide resolved
| entries.set(i, mutateEntryWithLegalChange(entry)); | ||
| } | ||
| } | ||
| updatedInstance = updatedInstance.withUpdatedEntriesForRepo(perRepoEntries.get(0).repository(), entries); |
There was a problem hiding this comment.
Can we also reorder the entries for one or more repositories? Really just to give the ByRepoDiff stuff a proper workout.
There was a problem hiding this comment.
Jup added shuffling the list. Had to change the way we generate the repo name for that. It was a random string per snapshot and we'd effectively never see collisions there. Now we actually have multiple snapshots per repo here and the position reordering is covered by tests.
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Ok - I'll suggest a couple of extra assertions in some following comments.
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
|
Thanks @DaveCTurner all points addressed I think :) |
|
Thanks David! |
This is very important for #77466. Profiling showed that serializing snapshots-in-progress when there's a few snapshots with high shard count running takes a significant amount of CPU and heap for sending the full data structure over an over.
This PR adds diffing in the simplest way I could think of on top of the existing data structure.
closes #88732