The waiting for 3 watcher nodes in a partially upgraded cluster has caused numerous failures.
Multiple upgrade tests have failed (seemingly) due to timing out for watcher to be available.
For all the failures, a commonality is:
Many logging entries on the old node like:
java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.cluster.NamedDiff][index_template]
at org.elasticsearch.common.io.stream.NamedWriteableRegistry.getReader(NamedWriteableRegistry.java:112) ~[elasticsearch-7.4.2.jar:7.4.2]
and
» [2021-11-29T15:28:50,675][WARN ][o.e.c.c.PublicationTransportHandler] [v7.6.1-2] unexpected error while deserializing an incoming cluster state
» java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.cluster.NamedDiff][component_template]
Followed by many entries such as:
[2021-11-29T15:25:28,834][DEBUG][o.e.x.w.WatcherService ] [v7.4.2-2] missing watcher index templates, not starting watcher service
This has failed upgrading from <7.6
Build scans:
https://gradle-enterprise.elastic.co/s/ut4mgi4xhjgiy/tests/:x-pack:qa:rolling-upgrade:v7.4.2%23twoThirdsUpgradedTest/org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT/test%20%7Bp0=mixed_cluster%2F90_ml_data_frame_analytics_crud%2FStart%20and%20stop%20old%20outlier_detection%20job%7D
Reproduction line:
./gradlew ':x-pack:qa:rolling-upgrade:v7.4.2#twoThirdsUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=mixed_cluster/90_ml_data_frame_analytics_crud/Start and stop old outlier_detection job}" -Dtests.seed=F85312FD66A3AAFE -Dtests.bwc=true -Dtests.locale=el-CY -Dtests.timezone=CTT -Druntime.java=8
Applicable branches:
7.16
Reproduces locally?:
No
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dmixed_cluster/90_ml_data_frame_analytics_crud/Start%20and%20stop%20old%20outlier_detection%20job%7D
Failure excerpt:
java.lang.AssertionError: Failure in test setup: Failed to initialize at least 3 watcher nodes
at __randomizedtesting.SeedInfo.seed([F85312FD66A3AAFE:70072D27C85FC706]:0)
at org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT.waitForWatcher(UpgradeClusterClientYamlTestSuiteIT.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:972)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.AssertionError:
Expected: "started"
but: was "stopped"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.junit.Assert.assertThat(Assert.java:956)
at org.junit.Assert.assertThat(Assert.java:923)
at org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT.lambda$waitForWatcher$0(UpgradeClusterClientYamlTestSuiteIT.java:70)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1123)
at org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT.waitForWatcher(UpgradeClusterClientYamlTestSuiteIT.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:972)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)
The waiting for 3 watcher nodes in a partially upgraded cluster has caused numerous failures.
Multiple upgrade tests have failed (seemingly) due to timing out for watcher to be available.
For all the failures, a commonality is:
Many logging entries on the old node like:
and
Followed by many entries such as:
This has failed upgrading from <7.6
Build scans:
https://gradle-enterprise.elastic.co/s/ut4mgi4xhjgiy/tests/:x-pack:qa:rolling-upgrade:v7.4.2%23twoThirdsUpgradedTest/org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT/test%20%7Bp0=mixed_cluster%2F90_ml_data_frame_analytics_crud%2FStart%20and%20stop%20old%20outlier_detection%20job%7D
Reproduction line:
./gradlew ':x-pack:qa:rolling-upgrade:v7.4.2#twoThirdsUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=mixed_cluster/90_ml_data_frame_analytics_crud/Start and stop old outlier_detection job}" -Dtests.seed=F85312FD66A3AAFE -Dtests.bwc=true -Dtests.locale=el-CY -Dtests.timezone=CTT -Druntime.java=8Applicable branches:
7.16
Reproduces locally?:
No
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dmixed_cluster/90_ml_data_frame_analytics_crud/Start%20and%20stop%20old%20outlier_detection%20job%7D
Failure excerpt: