Use a dedicated test executor in MockTransportService#112748
Use a dedicated test executor in MockTransportService#112748elasticsearchmachine merged 3 commits intoelastic:mainfrom
Conversation
Instead of using the generic executor for delayed transport actions, this PR adds a new executor to schedule these actions. It helps avoid sharing executors with the node which may lead to unexpected CI failures due to unsafe future assertion.
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks good, couple of minor nits.
| 4, | ||
| 30, | ||
| TimeUnit.SECONDS, | ||
| false, |
There was a problem hiding this comment.
Could we set rejectAfterShutdown to true please?
There was a problem hiding this comment.
I pondered on this and decided to use false to align with how the generic executor is defined. I also think it might be better to let the "real" executors to decide whether task should be rejected, i.e. if the task has reached here, it is not rejected and we should just queue it up. That said, I am OK with true as well since it was my initial preference for better predictability. See 63a4614
There was a problem hiding this comment.
it is not rejected and we should just queue it up
Sure but that's not what false does, it just silently drops any work submitted after shutdown.
My thinking here is twofold: firstly, we are terminating this ExecutorService at the end of the test, there should be no more tasks submitted by that point anyway. Secondly, IMO the silent-drop behaviour of generic after shutdown is a bug, even if it is unfortunately one we cannot easily address without changing lots of other things, but we can at least avoid relying on that same behaviour in specific situations like this one.
There was a problem hiding this comment.
but that's not what false does, it just silently drops any work submitted after shutdown.
TIL. Thanks!
| } catch (InterruptedException e) { | ||
| throw new IllegalStateException(e); | ||
| } finally { | ||
| ThreadPool.terminate(testExecutor, 10, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Could we have an assertTrue here to make sure it did actually terminate?
|
@elasticmachine update branch |
…tion-ironbank-ubi * upstream/main: (302 commits) Deduplicate BucketOrder when deserializing (elastic#112707) Introduce test utils for ingest pipelines (elastic#112733) [Test] Account for auto-repairing for shard gen file (elastic#112778) Do not throw in task enqueued by CancellableRunner (elastic#112780) Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439 Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779 Use a dedicated test executor in MockTransportService (elastic#112748) Estimate segment field usages (elastic#112760) (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522) Fix verifyVersions task (elastic#112765) (Doc+) Terminating Exit Codes (elastic#112530) (Doc+) CAT Nodes default columns (elastic#112715) [DOCS] Augment installation warnings (elastic#112756) Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769 Bump Elasticsearch to a minimum of JDK 21 (elastic#112252) ESQL: Compute support for filtering ungrouped aggs (elastic#112717) Bump Elasticsearch version to 9.0.0 (elastic#112570) add CDR related data streams to kibana_system priviliges (elastic#112655) Support widening of numeric types in union-types (elastic#112610) Introduce data stream options and failure store configuration classes (elastic#109515) ...
Instead of using the generic executor for delayed transport actions, this PR adds a new executor to schedule these actions. It helps avoid sharing executors with the node which may lead to unexpected CI failures due to unsafe future assertion.
There are a few edge cases where closing a node can causes test failures: Closing the handling node when looking up master node name. Closing the coordinating node when a search is ongoing. This can lead to leaking search context in MockSearchService on the data nodes. Closing the data node when a search is ongoing. This can lead to leaking resource on the coordinating node. This PR fixes 1 by avoiding lookup since the master node does not change and is already known. It fixes 2 by always uses master node as the coordinating node. It fixes 3 by avoid restarting search node. With these changes in place (along with elastic#2790, elastic#2966, elastic#2983, elastic#112748, elastic#114375) the test is stable enough (running in a loop for 40+ hours without failure) to be unmuted. Resolves: elastic#2327 Resolves:
There are a few edge cases where closing a node can causes test failures: Closing the handling node when looking up master node name. Closing the coordinating node when a search is ongoing. This can lead to leaking search context in MockSearchService on the data nodes. Closing the data node when a search is ongoing. This can lead to leaking resource on the coordinating node. This PR fixes 1 by avoiding lookup since the master node does not change and is already known. It fixes 2 by always uses master node as the coordinating node. It fixes 3 by avoid restarting search node. With these changes in place (along with elastic#2790, elastic#2966, elastic#2983, elastic#112748, elastic#114375) the test is stable enough (running in a loop for 40+ hours without failure) to be unmuted. Resolves: elastic#2327 Resolves:
Instead of using the generic executor for delayed transport actions, this PR adds a new executor to schedule these actions. It helps avoid sharing executors with the node which may lead to unexpected CI failures due to unsafe future assertion.