Remove trappy timeouts in snapshot APIs by DaveCTurner · Pull Request #109828 · elastic/elasticsearch

DaveCTurner · 2024-06-17T20:14:44Z

Wholesale fix of every TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT in
o.e.snapshots and o.e.repositories, just pulling them up to the REST
layer (where they become API params), the test suite (where they become
TEST_REQUEST_TIMEOUT), or some other place where an explicit value is
available.

Relates #107984

Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in `o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST layer (where they become API params), the test suite (where they become `TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is available. Relates elastic#107984

elasticsearchmachine · 2024-06-17T20:15:08Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Big change in terms of lines touched but it was all pretty mechanical apart from the few spots I've called out below.

DaveCTurner · 2024-06-17T20:16:24Z

...va/org/elasticsearch/action/admin/cluster/repositories/cleanup/CleanupRepositoryRequest.java


-    public CleanupRepositoryRequest(StreamInput in) throws IOException {
-        super(TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, DEFAULT_ACK_TIMEOUT);
+    public static CleanupRepositoryRequest readFrom(StreamInput in) throws IOException {


This request class was missing the AcknowledgedRequest header on the wire.

@UpdateForV9 seems appropriate here?

No need I think, we'll be clearing up the transport versions in v9 anyway and that'll expose cases like this.

DaveCTurner · 2024-06-17T20:19:00Z

server/src/main/java/org/elasticsearch/indices/recovery/plan/ShardSnapshotsService.java

+            clusterService.state().getMinTransportVersion().onOrAfter(TransportVersions.SNAPSHOT_REQUEST_TIMEOUTS)
+                ? TimeValue.MINUS_ONE
+                : TimeValue.MAX_VALUE,


This is an internal request so really should have infinite timeout (as long as the cluster is new enough to understand that - using the current PR's transport version to be sure). Previously it had a 30s timeout which was probably a mistake.

It probably makes sense to separate this out in its own PR?

DaveCTurner · 2024-06-17T20:19:44Z

...r/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestResetFeatureStateAction.java

    protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client) throws IOException {
-        final ResetFeatureStateRequest req = new ResetFeatureStateRequest();
-
+        final var req = new ResetFeatureStateRequest(RestUtils.getMasterNodeTimeout(request));


This API param was missing - added here and in the JSON spec.

May want to update doc as well.

Ah I didn't even realise this had docs. It's not really intended for end-users:

elasticsearch/docs/reference/features/apis/reset-features-api.asciidoc

Line 11 in 5e81668

WARNING: Intended for development and testing use only. Do not reset features on a production cluster.

DaveCTurner · 2024-06-17T20:20:15Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestSnapshotAction.java

        return "cat_snapshot_action";
    }

+    private static final String[] MATCH_ALL_PATTERNS = { ResolvedRepositories.ALL_PATTERN };


Pulled this up to a static constant.

DaveCTurner · 2024-06-17T20:21:39Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/WaitForSnapshotStep.java

        String snapshotName = snapPolicyMeta.getLastSuccess().getSnapshotName();
        String repositoryName = snapPolicyMeta.getPolicy().getRepository();
-        GetSnapshotsRequest request = new GetSnapshotsRequest().repositories(repositoryName)
+        GetSnapshotsRequest request = new GetSnapshotsRequest(TimeValue.MAX_VALUE).repositories(repositoryName)


This timeout was missing previously (i.e. defaulted to 30s) but like other ILM actions it should have been MAX_VALUE.

…napshot-requests

ywangd

LGTM

ywangd · 2024-06-19T02:21:16Z

...va/org/elasticsearch/action/admin/cluster/repositories/cleanup/CleanupRepositoryRequest.java


-    public CleanupRepositoryRequest(StreamInput in) throws IOException {
-        super(TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, DEFAULT_ACK_TIMEOUT);
+    public static CleanupRepositoryRequest readFrom(StreamInput in) throws IOException {


@UpdateForV9 seems appropriate here?

ywangd · 2024-06-19T02:34:43Z

...main/java/org/elasticsearch/action/admin/cluster/snapshots/create/CreateSnapshotRequest.java

+    public CreateSnapshotRequest(TimeValue masterNodeTimeout) {
+        super(masterNodeTimeout);


Not related to this PR: I wonder why create snapshot request and some other snapshot requests are not AcknowledgedRequest?

They don't (directly) involve waiting for other nodes to apply the cluster state.

ywangd · 2024-06-19T02:41:52Z

server/src/main/java/org/elasticsearch/indices/recovery/plan/ShardSnapshotsService.java

+            clusterService.state().getMinTransportVersion().onOrAfter(TransportVersions.SNAPSHOT_REQUEST_TIMEOUTS)
+                ? TimeValue.MINUS_ONE
+                : TimeValue.MAX_VALUE,


It probably makes sense to separate this out in its own PR?

ywangd · 2024-06-19T02:44:01Z

...r/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestResetFeatureStateAction.java

    protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient client) throws IOException {
-        final ResetFeatureStateRequest req = new ResetFeatureStateRequest();
-
+        final var req = new ResetFeatureStateRequest(RestUtils.getMasterNodeTimeout(request));


May want to update doc as well.

ywangd · 2024-06-19T02:58:49Z

...in/java/org/elasticsearch/action/admin/cluster/snapshots/restore/RestoreSnapshotRequest.java

+    @Deprecated(forRemoval = true) // temporary compatibility shim
+    public RestoreSnapshotRequest(String repository, String snapshot) {
+        this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, repository, snapshot);
    }


It does not seem to be used here? Is it for stateless?

…napshot-requests

These methods are no longer used in ES or any of its dependent code. Relates elastic#109828 Relates elastic#107984

These methods are no longer used in ES or any of its dependent code. Relates #109828 Relates #107984

In elastic#109828 we introduced a `DUMMY_TIMEOUT` constant for these handlers in which timeouts are ignored, but we have not been very consistent with its usage and it has an overly-generic name making it harder for readers to understand its meaning. This commit renames the constant to clarify why it's being used, and fixes up several spots where we should have been using it already.

In #109828 we introduced a `DUMMY_TIMEOUT` constant for these handlers in which timeouts are ignored, but we have not been very consistent with its usage and it has an overly-generic name making it harder for readers to understand its meaning. This commit renames the constant to clarify why it's being used, and fixes up several spots where we should have been using it already.

In elastic#109828 we deprecated several snapshot-related methods on `ClusterAdminClient` which imposed a trappy default timeout on their callers. This commit removes their usages from serverless so we can remove the trappy methods.

DaveCTurner added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.15.0 labels Jun 17, 2024

DaveCTurner requested a review from ywangd June 17, 2024 20:14

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Jun 17, 2024

DaveCTurner commented Jun 17, 2024

View reviewed changes

Merge branch 'main' into 2024/06/17/trappy-snapshot-requests

0a05f65

DaveCTurner added the test-update-serverless label Jun 18, 2024

DaveCTurner added 3 commits June 18, 2024 08:55

Merge remote-tracking branch 'upstream/main' into 2024/06/17/trappy-s…

6dd2156

…napshot-requests

Merge remote-tracking branch 'upstream/main' into 2024/06/17/trappy-s…

bd001af

…napshot-requests

Compatibility shims

c46666b

ywangd approved these changes Jun 19, 2024

View reviewed changes

DaveCTurner added 2 commits June 19, 2024 05:53

Merge branch 'main' into 2024/06/17/trappy-snapshot-requests

1c9b354

Reset features API docs

3ee3317

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 19, 2024

DaveCTurner added 5 commits June 19, 2024 21:50

Merge branch 'main' into 2024/06/17/trappy-snapshot-requests

7b122eb

Merge branch 'main' into 2024/06/17/trappy-snapshot-requests

2f07102

CI poke

39b2d3e

Merge remote-tracking branch 'upstream/main' into 2024/06/17/trappy-s…

e544d9e

…napshot-requests

Merge branch 'main' into 2024/06/17/trappy-snapshot-requests

7dcdecf

elasticsearchmachine merged commit 5662f98 into elastic:main Jun 20, 2024

DaveCTurner deleted the 2024/06/17/trappy-snapshot-requests branch June 20, 2024 21:11

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 29, 2024

Remove deprecated ClusterAdminClient methods

d2bc941

These methods are no longer used in ES or any of its dependent code. Relates elastic#109828 Relates elastic#107984

DaveCTurner mentioned this pull request Jul 29, 2024

Remove deprecated ClusterAdminClient methods #111418

Merged

DaveCTurner added a commit that referenced this pull request Jul 30, 2024

Remove deprecated ClusterAdminClient methods (#111418)

72571df

These methods are no longer used in ES or any of its dependent code. Relates #109828 Relates #107984

DaveCTurner mentioned this pull request Sep 12, 2024

Clean up timeouts in reserved cluster state handlers #112820

Merged

DaveCTurner mentioned this pull request Oct 25, 2025

Test utility for POST _features/_reset #137133

Merged

		public CreateSnapshotRequest(TimeValue masterNodeTimeout) {
		super(masterNodeTimeout);

Conversation

DaveCTurner commented Jun 17, 2024

Uh oh!

elasticsearchmachine commented Jun 17, 2024

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants