Skip to content

[CI] NodeTests suite timeouts on new windows workers #44256

@tvernum

Description

@tvernum

On Windows 2016 and windows 2012 we getting suite timeouts in what seem to be the very first suite run in that build.

It appears to that the test takes a long time between finishing the test method, and being marked as "done" - possibly due to clean up tasks in @After?

Looking at the Windows-2016 failure:

  • We start :server:test at 4:43 (04:43:11 > Task :server:test)
  • We fail a little over 20 minutes (1,200,000 ms) later.
05:05:42 org.elasticsearch.node.NodeTests > testCloseOnLeakedStoreReference FAILED
05:05:42     java.lang.Exception: Test abandoned because suite timeout was reached
...
    java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
  • The node being tested was stopped early in the 20 mminutes (7:45 in the node's TZ)
1> [2019-07-12T07:45:46,034][INFO ][o.e.n.Node               ] [testCloseOnLeakedStoreReference] closed 
  • but the "after test" log doesn't come until we're killing off the suite (8:05)
1> [2019-07-12T08:05:47,854][INFO ][o.e.n.NodeTests          ] [testCloseOnLeakedStoreReference] after test

It seems that something on this new Ephemeral Windows CI worker is causing a problem moving this test method from "testing complete" to "test done"

Metadata

Metadata

Assignees

Labels

:Core/Infra/CoreCore issues without another label>test-failureTriaged test failures from CI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions