Add the ability to fetch the latest successful shard snapshot#75080
Add the ability to fetch the latest successful shard snapshot#75080fcofdez merged 18 commits intoelastic:masterfrom
Conversation
33fb9db to
e39267f
Compare
There was a problem hiding this comment.
We could take into account PARTIAL snapshots too, the implementation would be a little less straightforward but we could consider it.
There was a problem hiding this comment.
I agree to defer this to a follow-up after we have done other tasks, this refinement can be done while benchmarking.
This commit adds a new master transport action TransportGetShardSnapshotAction that allows getting the last successful snapshot for a particular shard in a set of repositories. It deals with the different implementation details around BwC for repositories.
e39267f to
c244436
Compare
|
Pinging @elastic/es-distributed (Team:Distributed) |
henningandersen
left a comment
There was a problem hiding this comment.
Had a quick look only, added a couple of comments.
I would like to see added wire serialization tests too for the new serializable classes. See: AbstractWireSerializingTestCase
| ); | ||
|
|
||
| for (String repository : repositories) { | ||
| indexSnapshotsService.getLatestSuccessfulSnapshotForShard( |
There was a problem hiding this comment.
I think this could run in parallel on all the repos. I am guessing we will only need one repo in reality so it might not matter. But I would prefer to go serially through the repos and early terminate when a good snapshot is found. Or to just remove the multi-repo support, we can always add that if another use comes along.
There was a problem hiding this comment.
The point of the multi-repo support is to avoid needing to implement a notion of a "default repo", we can simply look at all available repos. Moreover I don't think we can early-terminate since we're searching for the latest snapshot.
However I do agree that we shouldn't fan out to all the repos at once like this, working one-at-a-time would be better.
There was a problem hiding this comment.
I agree to defer this to a follow-up after we have done other tasks, this refinement can be done while benchmarking.
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks good, I left mostly small comments.
.../java/org/elasticsearch/action/admin/cluster/snapshots/get/shard/GetShardSnapshotAction.java
Outdated
Show resolved
Hide resolved
| * Information about snapshotted file | ||
| */ | ||
| public static class FileInfo { | ||
| public static class FileInfo implements Writeable { |
There was a problem hiding this comment.
Are we really going to need the details of the files in the response? I think it would make more sense for the master to find a good snapshot and pass its ID onto the data node is, but then have the data node read these details itself.
There was a problem hiding this comment.
Addressed in d0e1ce0. I was wondering if we should parse the latest commit in the snapshot to get the max seqNo and local checkpoint and include that in the response.
There was a problem hiding this comment.
Not sure, but I think we can leave that to the data nodes.
server/src/internalClusterTest/java/org/elasticsearch/repositories/IndexSnapshotsServiceIT.java
Show resolved
Hide resolved
...java/org/elasticsearch/action/admin/cluster/snapshots/get/shard/GetShardSnapshotRequest.java
Outdated
Show resolved
Hide resolved
.../elasticsearch/action/admin/cluster/snapshots/get/shard/TransportGetShardSnapshotAction.java
Outdated
Show resolved
Hide resolved
| ); | ||
|
|
||
| for (String repository : repositories) { | ||
| indexSnapshotsService.getLatestSuccessfulSnapshotForShard( |
There was a problem hiding this comment.
The point of the multi-repo support is to avoid needing to implement a notion of a "default repo", we can simply look at all available repos. Moreover I don't think we can early-terminate since we're searching for the latest snapshot.
However I do agree that we shouldn't fan out to all the repos at once like this, working one-at-a-time would be better.
server/src/main/java/org/elasticsearch/repositories/ShardSnapshotInfo.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/IndexSnapshotsService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/IndexSnapshotsService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/IndexSnapshotsService.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @DaveCTurner ! I've tackled your comments. Let me know if you think we should change something else 👍 |
original-brownbear
left a comment
There was a problem hiding this comment.
Just a few random points, mainly around testing. Looking just fine otherwise :)
server/src/internalClusterTest/java/org/elasticsearch/repositories/IndexSnapshotsServiceIT.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/repositories/IndexSnapshotsServiceIT.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/repositories/IndexSnapshotsServiceIT.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/action/admin/cluster/snapshots/get/shard/GetShardSnapshotRequest.java
Show resolved
Hide resolved
.../elasticsearch/action/admin/cluster/snapshots/get/shard/TransportGetShardSnapshotAction.java
Outdated
Show resolved
Hide resolved
.../elasticsearch/action/admin/cluster/snapshots/get/shard/TransportGetShardSnapshotAction.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/action/admin/cluster/snapshots/get/shard/GetShardSnapshotRequest.java
Show resolved
Hide resolved
|
|
||
| this.indexId = indexId; | ||
| this.shardId = shardId; | ||
| this.snapshotInfo = snapshotInfo; |
There was a problem hiding this comment.
AFAICT this is the complete SnapshotInfo, including all indices and shard failures (including stack traces) and so on. That might be pretty sizeable, and mostly not relevant to the shard in question. Can we just pick out the bits we need?
...arch/action/admin/cluster/snapshots/get/shard/GetShardSnapshotRequestSerializationTests.java
Show resolved
Hide resolved
| * Information about snapshotted file | ||
| */ | ||
| public static class FileInfo { | ||
| public static class FileInfo implements Writeable { |
There was a problem hiding this comment.
Not sure, but I think we can leave that to the data nodes.
DaveCTurner
left a comment
There was a problem hiding this comment.
I left three pretty small comments; apart from that this all looks good.
original-brownbear
left a comment
There was a problem hiding this comment.
+1 to David's open points, LGTM otherwise no need for another round from me I think :)
|
@elasticmachine run elasticsearch-ci/part-1 |
|
@elasticmachine run elasticsearch-ci/part-1 |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM (one more tiny suggestion). Great work @fcofdez 👍
| GetShardSnapshotRequest(List<String> repositories, ShardId shardId) { | ||
| assert repositories.isEmpty() == false; | ||
| assert repositories.stream().noneMatch(Objects::isNull); | ||
| assert repositories.size() == 1 && repositories.get(0).equals(ALL_REPOSITORIES) |
There was a problem hiding this comment.
nit:
| assert repositories.size() == 1 && repositories.get(0).equals(ALL_REPOSITORIES) | |
| assert repositories.size() == 1 |
(if not, I'd prefer having clarifying parentheses, I find it trappy to rely on && binding more tightly than ||)
|
Thanks all for the review! |
This commit adds a new master transport action TransportGetShardSnapshotAction that allows getting the last successful snapshot for a particular shard in a set of repositories. It deals with the different implementation details around BwC for repositories. Relates elastic#73496 Backport of elastic#75080
…c#75080) This commit adds a new master transport action TransportGetShardSnapshotAction that allows getting the last successful snapshot for a particular shard in a set of repositories. It deals with the different implementation details around BwC for repositories. Relates elastic#73496
This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.
Relates #73496