Remove trappy timeouts in snapshot APIs#109828
Remove trappy timeouts in snapshot APIs#109828elasticsearchmachine merged 12 commits intoelastic:mainfrom
Conversation
Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in `o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST layer (where they become API params), the test suite (where they become `TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is available. Relates elastic#107984
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Big change in terms of lines touched but it was all pretty mechanical apart from the few spots I've called out below.
|
|
||
| public CleanupRepositoryRequest(StreamInput in) throws IOException { | ||
| super(TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, DEFAULT_ACK_TIMEOUT); | ||
| public static CleanupRepositoryRequest readFrom(StreamInput in) throws IOException { |
There was a problem hiding this comment.
This request class was missing the AcknowledgedRequest header on the wire.
There was a problem hiding this comment.
@UpdateForV9 seems appropriate here?
There was a problem hiding this comment.
No need I think, we'll be clearing up the transport versions in v9 anyway and that'll expose cases like this.
| clusterService.state().getMinTransportVersion().onOrAfter(TransportVersions.SNAPSHOT_REQUEST_TIMEOUTS) | ||
| ? TimeValue.MINUS_ONE | ||
| : TimeValue.MAX_VALUE, |
There was a problem hiding this comment.
This is an internal request so really should have infinite timeout (as long as the cluster is new enough to understand that - using the current PR's transport version to be sure). Previously it had a 30s timeout which was probably a mistake.
There was a problem hiding this comment.
It probably makes sense to separate this out in its own PR?
| protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client) throws IOException { | ||
| final ResetFeatureStateRequest req = new ResetFeatureStateRequest(); | ||
|
|
||
| final var req = new ResetFeatureStateRequest(RestUtils.getMasterNodeTimeout(request)); |
There was a problem hiding this comment.
This API param was missing - added here and in the JSON spec.
There was a problem hiding this comment.
Ah I didn't even realise this had docs. It's not really intended for end-users:
| return "cat_snapshot_action"; | ||
| } | ||
|
|
||
| private static final String[] MATCH_ALL_PATTERNS = { ResolvedRepositories.ALL_PATTERN }; |
There was a problem hiding this comment.
Pulled this up to a static constant.
| String snapshotName = snapPolicyMeta.getLastSuccess().getSnapshotName(); | ||
| String repositoryName = snapPolicyMeta.getPolicy().getRepository(); | ||
| GetSnapshotsRequest request = new GetSnapshotsRequest().repositories(repositoryName) | ||
| GetSnapshotsRequest request = new GetSnapshotsRequest(TimeValue.MAX_VALUE).repositories(repositoryName) |
There was a problem hiding this comment.
This timeout was missing previously (i.e. defaulted to 30s) but like other ILM actions it should have been MAX_VALUE.
|
|
||
| public CleanupRepositoryRequest(StreamInput in) throws IOException { | ||
| super(TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, DEFAULT_ACK_TIMEOUT); | ||
| public static CleanupRepositoryRequest readFrom(StreamInput in) throws IOException { |
There was a problem hiding this comment.
@UpdateForV9 seems appropriate here?
| public CreateSnapshotRequest(TimeValue masterNodeTimeout) { | ||
| super(masterNodeTimeout); |
There was a problem hiding this comment.
Not related to this PR: I wonder why create snapshot request and some other snapshot requests are not AcknowledgedRequest?
There was a problem hiding this comment.
They don't (directly) involve waiting for other nodes to apply the cluster state.
| clusterService.state().getMinTransportVersion().onOrAfter(TransportVersions.SNAPSHOT_REQUEST_TIMEOUTS) | ||
| ? TimeValue.MINUS_ONE | ||
| : TimeValue.MAX_VALUE, |
There was a problem hiding this comment.
It probably makes sense to separate this out in its own PR?
| protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client) throws IOException { | ||
| final ResetFeatureStateRequest req = new ResetFeatureStateRequest(); | ||
|
|
||
| final var req = new ResetFeatureStateRequest(RestUtils.getMasterNodeTimeout(request)); |
| @Deprecated(forRemoval = true) // temporary compatibility shim | ||
| public RestoreSnapshotRequest(String repository, String snapshot) { | ||
| this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, repository, snapshot); | ||
| } |
There was a problem hiding this comment.
It does not seem to be used here? Is it for stateless?
These methods are no longer used in ES or any of its dependent code. Relates elastic#109828 Relates elastic#107984
In elastic#109828 we introduced a `DUMMY_TIMEOUT` constant for these handlers in which timeouts are ignored, but we have not been very consistent with its usage and it has an overly-generic name making it harder for readers to understand its meaning. This commit renames the constant to clarify why it's being used, and fixes up several spots where we should have been using it already.
In #109828 we introduced a `DUMMY_TIMEOUT` constant for these handlers in which timeouts are ignored, but we have not been very consistent with its usage and it has an overly-generic name making it harder for readers to understand its meaning. This commit renames the constant to clarify why it's being used, and fixes up several spots where we should have been using it already.
In elastic#109828 we deprecated several snapshot-related methods on `ClusterAdminClient` which imposed a trappy default timeout on their callers. This commit removes their usages from serverless so we can remove the trappy methods.
In elastic#109828 we deprecated several snapshot-related methods on `ClusterAdminClient` which imposed a trappy default timeout on their callers. This commit removes their usages from serverless so we can remove the trappy methods.
Wholesale fix of every
TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUTino.e.snapshotsando.e.repositories, just pulling them up to the RESTlayer (where they become API params), the test suite (where they become
TEST_REQUEST_TIMEOUT), or some other place where an explicit value isavailable.
Relates #107984