Skip to content

[CI] Test failed with AssertionError/Must be on selector thread #28729

@tlrx

Description

@tlrx

The test AzureMinimumMasterNodesTests.testSimpleOnlyMasterNodeElection failed today on CI:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=fedora/693

The test execution log is consoleText.txt. It shows that the test itself is executed correctly but an AssertionError was raised by the transport layer:

2> feb. 19, 2018 8:28:09 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_s0][management][T#2],5,TGRP-AzureMinimumMasterNodesTests]
  2> java.lang.AssertionError: Must be on selector thread
  2> 	at __randomizedtesting.SeedInfo.seed([DF4EF41A200534E4]:0)
  2> 	at org.elasticsearch.transport.nio.SocketSelector.executeFailedListener(SocketSelector.java:160)
  2> 	at org.elasticsearch.transport.nio.SocketSelector.queueWrite(SocketSelector.java:111)
  2> 	at org.elasticsearch.transport.nio.channel.TcpWriteContext.sendMessage(TcpWriteContext.java:50)
  2> 	at org.elasticsearch.transport.nio.channel.TcpNioSocketChannel.sendMessage(TcpNioSocketChannel.java:38)
  2> 	at org.elasticsearch.transport.TcpTransport.internalSendMessage(TcpTransport.java:1127)
  2> 	at org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:1113)
  2> 	at org.elasticsearch.transport.TcpTransport.access$1700(TcpTransport.java:122)
  2> 	at org.elasticsearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:482)
  2> 	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:598)
  2> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:518)
  2> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:506)
  2> 	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:197)
  2> 	at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:89)
  2> 	at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:52)
  2> 	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)
  2> 	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)
  2> 	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81)
  2> 	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83)
  2> 	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72)
  2> 	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:405)
  2> 	at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:712)
  2> 	at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.nodesStats(AbstractClient.java:808)
  2> 	at org.elasticsearch.cluster.InternalClusterInfoService.updateNodeStats(InternalClusterInfoService.java:254)
  2> 	at org.elasticsearch.cluster.InternalClusterInfoService.refresh(InternalClusterInfoService.java:290)
  2> 	at org.elasticsearch.cluster.InternalClusterInfoService.maybeRefresh(InternalClusterInfoService.java:275)
  2> 	at org.elasticsearch.cluster.InternalClusterInfoService.lambda$onMaster$0(InternalClusterInfoService.java:140)
  2> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573)
  2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2> 	at java.lang.Thread.run(Thread.java:748)

It seems that the TcpWriteContext.sendMessage() decided to queue the write operation using SocketSelector.queueWrite(WriteOperation) because the write was not issued on a selector thread. At this time the channel is not closed yet and is still writeable.

Then the queueWrite() method checked if the selector was closed or not. Apparently it was closed and it removed the write operation from the queue and then executed executeFailedListener() which contains the failing assertion.

This assertion was specially added recently and I'm not sure if/how fix this, so I'm assigning this issue to you @tbrooks8 :)

Metadata

Metadata

Assignees

Labels

:Core/Infra/Transport APITransport client API>testIssues or PRs that are addressing/adding tests

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions