Skip to content

Execute blackhole response on correct transport#92312

Merged
elasticsearchmachine merged 3 commits intoelastic:mainfrom
DaveCTurner:2022-12-12-fix-92307
Dec 13, 2022
Merged

Execute blackhole response on correct transport#92312
elasticsearchmachine merged 3 commits intoelastic:mainfrom
DaveCTurner:2022-12-12-fix-92307

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

We simulate sending traffic along a blackholed connection with a disconnection exception that we withold until far into the future. Today this exception is delivered using the transport of the responding node, but it should be the requesting node. This commit fixes that.

Not sure how this ever worked, but we made things stricter in #92245 and that meant this bug led to test failures.

Closes #92307

We simulate sending traffic along a blackholed connection with a
disconnection exception that we withold until far into the future. Today
this exception is delivered using the transport of the responding node,
but it should be the requesting node. This commit fixes that.

Not sure how this _ever_ worked, but we made things stricter in elastic#92245
and that meant this bug led to test failures.

Closes elastic#92307
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.6.1 v8.7.0 labels Dec 12, 2022
@DaveCTurner DaveCTurner requested a review from pxsalehi December 12, 2022 19:41
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Dec 12, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

protected void onDisconnectedDuringSend(long requestId, String action, DisruptableMockTransport destinationTransport) {
destinationTransport.execute(getDisconnectException(requestId, action, destinationTransport.getLocalNode()));
private void onDisconnectedDuringSend(long requestId, String action, DisruptableMockTransport destinationTransport) {
execute(getDisconnectException(requestId, action, destinationTransport.getLocalNode()));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the fix, everything else is just noise. Looks simple, but wow this took hours to pin down.

deterministicTaskQueue.runAllRunnableTasks();

assertTrue(responseHandlerReleased.get());
assertTrue(rebootedNodes.contains(node1) || responseHandlerCalled.get());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion shows the problem.

@DaveCTurner
Copy link
Copy Markdown
Member Author

@elasticmachine update branch

@DaveCTurner DaveCTurner added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge labels Dec 13, 2022
@elasticsearchmachine elasticsearchmachine merged commit 51fe506 into elastic:main Dec 13, 2022
@DaveCTurner DaveCTurner deleted the 2022-12-12-fix-92307 branch December 13, 2022 10:13
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Dec 13, 2022
We simulate sending traffic along a blackholed connection with a
disconnection exception that we withold until far into the future. Today
this exception is delivered using the transport of the responding node,
but it should be the requesting node. This commit fixes that.

Not sure how this _ever_ worked, but we made things stricter in elastic#92245
and that meant this bug led to test failures.

Closes elastic#92307
elasticsearchmachine pushed a commit that referenced this pull request Dec 13, 2022
We simulate sending traffic along a blackholed connection with a
disconnection exception that we withold until far into the future. Today
this exception is delivered using the transport of the responding node,
but it should be the requesting node. This commit fixes that.

Not sure how this _ever_ worked, but we made things stricter in #92245
and that meant this bug led to test failures.

Closes #92307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v8.6.1 v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] CoordinatorTests testAppliesNoMasterBlockAllIfConfigured failing

5 participants