This looks to be reproducibly failing on our single processor node test job, and given the nature of the error message this very well be a legitimate deadlock scenario.
Build scan:
https://gradle-enterprise.elastic.co/s/g5rztp2kv5rq6/tests/:x-pack:plugin:yamlRestTest/org.elasticsearch.xpack.test.rest.XPackRestIT/test%20%7Bp0=ml%2Fupgrade_job_snapshot%2FTest%20existing%20but%20corrupt%20snapshot%7D
Reproduction line:
./gradlew ':x-pack:plugin:yamlRestTest' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/upgrade_job_snapshot/Test existing but corrupt snapshot}" -Dtests.seed=BCFE2A9A03956ACA -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=US/Indiana-Starke -Druntime.java=17 -Dtests.configure_test_clusters_with_one_processor=true
Applicable branches:
main
Reproduces locally?:
Yes
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.test.rest.XPackRestIT&tests.test=test%20%7Bp0%3Dml/upgrade_job_snapshot/Test%20existing%20but%20corrupt%20snapshot%7D
Failure excerpt:
org.elasticsearch.client.ResponseException: method [POST], host [http://127.0.0.1:37401], URI [/_features/_reset], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"timeout_exception","reason":"Timed out waiting for completion of [Task{id=4348, type='persistent', action='xpack/ml/job/snapshot/upgrade[c]', description='job-snapshot-upgrade-upgrade-model-snapshot-1234567890', parentTask=cluster:10, startTime=1661288471981, startTimeNanos=2146045551685}]"}],"type":"failed_node_exception","reason":"Failed node [r9fh43zvSCuFWI9e5ehabQ]","node_id":"r9fh43zvSCuFWI9e5ehabQ","caused_by":{"type":"timeout_exception","reason":"Timed out waiting for completion of [Task{id=4348, type='persistent', action='xpack/ml/job/snapshot/upgrade[c]', description='job-snapshot-upgrade-upgrade-model-snapshot-1234567890', parentTask=cluster:10, startTime=1661288471981, startTimeNanos=2146045551685}]"}},"status":500}
at __randomizedtesting.SeedInfo.seed([BCFE2A9A03956ACA:34AA1540AD690732]:0)
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:347)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:313)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
at org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.resetFeatures(MlRestTestStateCleaner.java:35)
at org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.clearMlState(AbstractXPackRestTest.java:131)
at org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.cleanup(AbstractXPackRestTest.java:115)
at jdk.internal.reflect.GeneratedMethodAccessor14.invoke(null:-1)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:568)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.lang.Thread.run(Thread.java:833)
This looks to be reproducibly failing on our single processor node test job, and given the nature of the error message this very well be a legitimate deadlock scenario.
Build scan:
https://gradle-enterprise.elastic.co/s/g5rztp2kv5rq6/tests/:x-pack:plugin:yamlRestTest/org.elasticsearch.xpack.test.rest.XPackRestIT/test%20%7Bp0=ml%2Fupgrade_job_snapshot%2FTest%20existing%20but%20corrupt%20snapshot%7D
Reproduction line:
./gradlew ':x-pack:plugin:yamlRestTest' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/upgrade_job_snapshot/Test existing but corrupt snapshot}" -Dtests.seed=BCFE2A9A03956ACA -Dtests.locale=th-TH-u-nu-thai-x-lvariant-TH -Dtests.timezone=US/Indiana-Starke -Druntime.java=17 -Dtests.configure_test_clusters_with_one_processor=trueApplicable branches:
main
Reproduces locally?:
Yes
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.test.rest.XPackRestIT&tests.test=test%20%7Bp0%3Dml/upgrade_job_snapshot/Test%20existing%20but%20corrupt%20snapshot%7D
Failure excerpt: