Skip to content

Use a dedicated test executor in MockTransportService#112748

Merged
elasticsearchmachine merged 3 commits intoelastic:mainfrom
ywangd:use-test-executor-in-mock-transport
Sep 12, 2024
Merged

Use a dedicated test executor in MockTransportService#112748
elasticsearchmachine merged 3 commits intoelastic:mainfrom
ywangd:use-test-executor-in-mock-transport

Conversation

@ywangd
Copy link
Copy Markdown
Member

@ywangd ywangd commented Sep 11, 2024

Instead of using the generic executor for delayed transport actions, this PR adds a new executor to schedule these actions. It helps avoid sharing executors with the node which may lead to unexpected CI failures due to unsafe future assertion.

Instead of using the generic executor for delayed transport actions,
this PR adds a new executor to schedule these actions. It helps avoid
sharing executors with the node which may lead to unexpected CI failures
due to unsafe future assertion.
@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Distributed/Network Http and internode communication implementations v8.16.0 labels Sep 11, 2024
@ywangd ywangd requested a review from DaveCTurner September 11, 2024 13:22
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Sep 11, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, couple of minor nits.

4,
30,
TimeUnit.SECONDS,
false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set rejectAfterShutdown to true please?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pondered on this and decided to use false to align with how the generic executor is defined. I also think it might be better to let the "real" executors to decide whether task should be rejected, i.e. if the task has reached here, it is not rejected and we should just queue it up. That said, I am OK with true as well since it was my initial preference for better predictability. See 63a4614

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not rejected and we should just queue it up

Sure but that's not what false does, it just silently drops any work submitted after shutdown.

My thinking here is twofold: firstly, we are terminating this ExecutorService at the end of the test, there should be no more tasks submitted by that point anyway. Secondly, IMO the silent-drop behaviour of generic after shutdown is a bug, even if it is unfortunately one we cannot easily address without changing lots of other things, but we can at least avoid relying on that same behaviour in specific situations like this one.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that's not what false does, it just silently drops any work submitted after shutdown.

TIL. Thanks!

} catch (InterruptedException e) {
throw new IllegalStateException(e);
} finally {
ThreadPool.terminate(testExecutor, 10, TimeUnit.SECONDS);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have an assertTrue here to make sure it did actually terminate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep see 63a4614

@ywangd ywangd requested a review from DaveCTurner September 11, 2024 14:35
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Sep 11, 2024

@elasticmachine update branch

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 11, 2024
@elasticsearchmachine elasticsearchmachine merged commit 6a8ac53 into elastic:main Sep 12, 2024
@ywangd ywangd deleted the use-test-executor-in-mock-transport branch September 12, 2024 00:05
v1v added a commit to v1v/elasticsearch that referenced this pull request Sep 12, 2024
…tion-ironbank-ubi

* upstream/main: (302 commits)
  Deduplicate BucketOrder when deserializing (elastic#112707)
  Introduce test utils for ingest pipelines (elastic#112733)
  [Test] Account for auto-repairing for shard gen file (elastic#112778)
  Do not throw in task enqueued by CancellableRunner (elastic#112780)
  Mute org.elasticsearch.script.StatsSummaryTests testEqualsAndHashCode elastic#112439
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testTransportException elastic#112779
  Use a dedicated test executor in MockTransportService (elastic#112748)
  Estimate segment field usages (elastic#112760)
  (Doc+) Inference Pipeline ignores Mapping Analyzers (elastic#112522)
  Fix verifyVersions task (elastic#112765)
  (Doc+) Terminating Exit Codes (elastic#112530)
  (Doc+) CAT Nodes default columns (elastic#112715)
  [DOCS] Augment installation warnings (elastic#112756)
  Mute org.elasticsearch.repositories.blobstore.testkit.integrity.RepositoryVerifyIntegrityIT testCorruption elastic#112769
  Bump Elasticsearch to a minimum of JDK 21 (elastic#112252)
  ESQL: Compute support for filtering ungrouped aggs (elastic#112717)
  Bump Elasticsearch version to 9.0.0 (elastic#112570)
  add CDR related data streams to kibana_system priviliges (elastic#112655)
  Support widening of numeric types in union-types (elastic#112610)
  Introduce data stream options and failure store configuration classes (elastic#109515)
  ...
davidkyle pushed a commit that referenced this pull request Sep 12, 2024
Instead of using the generic executor for delayed transport actions,
this PR adds a new executor to schedule these actions. It helps avoid
sharing executors with the node which may lead to unexpected CI failures
due to unsafe future assertion.
breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
There are a few edge cases where closing a node can causes test failures:

Closing the handling node when looking up master node name.
Closing the coordinating node when a search is ongoing. This can lead to leaking search context in MockSearchService on the data nodes.
Closing the data node when a search is ongoing. This can lead to leaking resource on the coordinating node.
This PR fixes 1 by avoiding lookup since the master node does not change and is already known. It fixes 2 by always uses master node as the coordinating node. It fixes 3 by avoid restarting search node. With these changes in place (along with elastic#2790, elastic#2966, elastic#2983, elastic#112748, elastic#114375) the test is stable enough (running in a loop for 40+ hours without failure) to be unmuted.

Resolves: elastic#2327
Resolves:
breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
There are a few edge cases where closing a node can causes test failures:

Closing the handling node when looking up master node name.
Closing the coordinating node when a search is ongoing. This can lead to leaking search context in MockSearchService on the data nodes.
Closing the data node when a search is ongoing. This can lead to leaking resource on the coordinating node.
This PR fixes 1 by avoiding lookup since the master node does not change and is already known. It fixes 2 by always uses master node as the coordinating node. It fixes 3 by avoid restarting search node. With these changes in place (along with elastic#2790, elastic#2966, elastic#2983, elastic#112748, elastic#114375) the test is stable enough (running in a loop for 40+ hours without failure) to be unmuted.

Resolves: elastic#2327
Resolves:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants