Skip to content

Add peer recovery planners that take into account available snapshots#75840

Merged
fcofdez merged 30 commits intoelastic:masterfrom
fcofdez:peer-recovery-from-snapshot-planners
Aug 9, 2021
Merged

Add peer recovery planners that take into account available snapshots#75840
fcofdez merged 30 commits intoelastic:masterfrom
fcofdez:peer-recovery-from-snapshot-planners

Conversation

@fcofdez
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez commented Jul 29, 2021

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot.

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. This commit doesn't wire these classes
yet to avoid adding additional noise.
@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 29, 2021

@elasticmachine run elasticsearch-ci/part-2

(it was a docker-compose problem)

@fcofdez fcofdez requested a review from DaveCTurner July 29, 2021 16:21
@fcofdez fcofdez added v7.15.0 :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team. labels Jul 29, 2021
@fcofdez fcofdez marked this pull request as ready for review July 29, 2021 16:21
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Jul 30, 2021

@DaveCTurner I added the wiring + feature flag, could you take a look when you have the chance? Thanks!

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fcofdez for wiring everything up. I think I understand how it all fits together now, but I left a few suggestions for simplifying things in areas where we aren't using all the generality we're introducing. I haven't gone right down to the lowest levels of detail yet because I think things will still move around quite a bit.

@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Aug 4, 2021

We discussed offline the pros and cons of using a single repository specified as a node setting and we reached the conclusion that this might be limiting in terms of flexibility. We agreed on introducing a new repository setting that would enable using the latest available snapshot from that repository explicitly. We would implement this in a subsequent PR. cc @DaveCTurner @henningandersen

@fcofdez fcofdez requested a review from DaveCTurner August 4, 2021 15:53
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close, I left only tiny suggestions.

}

ShardSnapshot latestSnapshot = availableSnapshots.stream()
.max(Comparator.comparingLong(ShardSnapshot::getStartedAt))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up I'd like us to push this .max calculation through TransportGetShardSnapshotAction and into IndexSnapshotsService: as it stands we might retrieve a SnapshotInfo from each repository but really if we checked them in time order we could do them in reverse time order and stop as soon as we've found the one we'll be using here.

If we can't do that before 7.15 then it's probably ok, it'll just leave a bit of BwC cruft.

final long totalSize = shardRecoveryPlan.getTotalSize();
final long existingTotalSize = shardRecoveryPlan.getExistingSize();

if (logger.isTraceEnabled()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work keeping this logging intact.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just a handful more small things that struck me from looking at #76114.

@fcofdez fcofdez requested a review from DaveCTurner August 9, 2021 10:57
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@fcofdez fcofdez merged commit 3c8b9a6 into elastic:master Aug 9, 2021
@fcofdez
Copy link
Copy Markdown
Contributor Author

fcofdez commented Aug 9, 2021

Thanks David!

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Aug 9, 2021
This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates elastic#73496
Backport of elastic#75840
fcofdez added a commit that referenced this pull request Aug 9, 2021
…#76239)

This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.

It adds a new recovery setting `indices.recovery.use_snapshots`

Relates #73496
Backport of #75840
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team. v7.15.0 v8.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants