Skip to content

Use proper executor for failing requests when connection closes#109236

Merged
henningandersen merged 11 commits intoelastic:mainfrom
henningandersen:fix_transport_service_on_closed_executor
Jun 1, 2024
Merged

Use proper executor for failing requests when connection closes#109236
henningandersen merged 11 commits intoelastic:mainfrom
henningandersen:fix_transport_service_on_closed_executor

Conversation

@henningandersen
Copy link
Copy Markdown
Contributor

Now use the executor declared by the handler rather than generic, since using generic is trappy wrt. deadlocks.

Closes #109225

Now use the executor declared by the handler rather than generic, since using
generic is trappy wrt. deadlocks.

Closes elastic#109225
@henningandersen henningandersen added >bug :Distributed/Network Http and internode communication implementations v8.15.0 labels May 31, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @henningandersen, I've created a changelog YAML for you.

@henningandersen
Copy link
Copy Markdown
Contributor Author

Opening as draft to see what CI and devs thinks before adding more tests.

@henningandersen
Copy link
Copy Markdown
Contributor Author

ci/part-2 failure is #109237

@henningandersen henningandersen marked this pull request as ready for review May 31, 2024 13:06
@henningandersen
Copy link
Copy Markdown
Contributor Author

This is now ready for reviews.

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label May 31, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good, I left a handful of smaller comments.

if (enableStackOverflowAvoidance == false) {
handler.handleException(exception);
} else {
threadPool.generic().submit(new AbstractRunnable() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather we used a ForkingResponseHandlerRunnable on both branches here, even though ofc GENERIC doesn't reject anything today so it doesn't really matter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly annoying in that it asserts that the handler is not using direct executor, hence i did this route. I can make it work ofc, let me give it a shot.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure I meant not for the direct case, just if enableStackOverflowAvoidance

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, absolutely, the handler verification is still annoying, but made it slightly more optional when fixing this here:

f93cc1c

})), unusedReader(), executor)
);
nodeA.transportService.onConnectionClosed(connection);
expectThrows(ExecutionException.class, NodeDisconnectedException.class, () -> future.get(10, TimeUnit.SECONDS));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using an org.elasticsearch.action.support.ActionTestUtils#assertNoSuccessListener and one of the safeAwait or safeGet methods.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is better, see: f34ef8f

@Nullable TransportException transportException,
Executor executorUsed
) {
assert executorUsed != EsExecutors.DIRECT_EXECUTOR_SERVICE : "forking handler required, but got " + handler;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah this LGTM

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henningandersen
Copy link
Copy Markdown
Contributor Author

Thanks David!

@henningandersen henningandersen merged commit a5092e2 into elastic:main Jun 1, 2024
elasticsearchmachine pushed a commit that referenced this pull request Oct 10, 2024
When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in #109236
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Oct 10, 2024
…#114375)

When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in elastic#109236
elasticsearchmachine pushed a commit that referenced this pull request Oct 10, 2024
#114493)

When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in #109236
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 10, 2024
…#114375)

When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in elastic#109236
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 13, 2024
…#114375)

When TransportService fails to send a transport action, it can complete
the listener's `onFailure` with the `generic` executor. If the listener
is a `PlainActionFuture` and also waits to be completed with a `generic`
thread, it will trip the `assertCompleteAllowed` assertion. 

https://github.com/elastic/elasticsearch/blob/fb482f863d5430702b19bd3dd23e9d8652f12ddd/server/src/main/java/org/elasticsearch/transport/TransportService.java#L1062-L1064

With this PR, we no longer fork to the generic thread pool and instead
just handle the exeption inline with the current thread. The expectation
is that the downstream handler should take care potential stack overflow
issues. This is similar to what is done in elastic#109236
arteam added a commit to arteam/elasticsearch that referenced this pull request Mar 13, 2025
…warning

This setting is not getting removed in ES 9.0, so its usage should not generate a critical deprecation warning.

`TransportServiceLifecycleTests` already uses `assertWarnings` instead of `assertCriticalWarnings` and we
don't distinguish between warning levels in assertions (See elastic#80304)

See elastic#109236
See ES-11224
arteam added a commit to arteam/elasticsearch that referenced this pull request Mar 13, 2025
arteam added a commit that referenced this pull request Mar 14, 2025
…warning (#124788)

This setting is not getting removed in ES 9.0, so its usage should not generate a critical deprecation warning.

`TransportServiceLifecycleTests` already uses `assertWarnings` instead of `assertCriticalWarnings` and we
don't distinguish between warning levels in assertions (See #80304)

See #109236
See ES-11224
arteam added a commit that referenced this pull request Mar 14, 2025
This setting was supposed to be removed in 9.0, but turned to be non-critical. We should remove it in ES 10.0

See #109236
See ES-11224
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SearchableSnapshotsIntegTests.testCreateAndRestoreSearchableSnapshot failing

3 participants