Skip to content

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failure on master #46508

@dakrone

Description

@dakrone

From https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/583/console & https://gradle-enterprise.elastic.co/s/6x67ha6426acy/console-log

  2> REPRODUCE WITH: ./gradlew ':x-pack:plugin:ilm:test' --tests "org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress" -Dtests.seed=7BA427BA999CD99D -Dtests.security.manager=true -Dtests.locale=fr-GP -Dtests.timezone=America/Edmonton -Dcompiler.java=12 -Druntime.java=11
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([7BA427BA999CD99D:67E39A043E8F736]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertNotNull(Assert.java:712)
        at org.junit.Assert.assertNotNull(Assert.java:722)
        at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.lambda$testRetentionWhileSnapshotInProgress$2(SLMSnapshotBlockingIntegTests.java:153)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:866)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:840)
        at org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress(SLMSnapshotBlockingIntegTests.java:146)

Likely from this exception when trying to kick off the second snapshot:

  1> [2019-09-09T13:42:03,563][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: STARTED
  1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> waiting for snapshot snap-qdwsdayhtfuymbsj7vi2yw to be completed, got: SUCCESS
  1> [2019-09-09T13:42:03,821][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> blocking nodes from completing snapshot
  1> [2019-09-09T13:42:03,822][INFO ][o.e.x.s.SnapshotLifecycleTask] [node_s0] snapshot lifecycle policy [slm-policy] issuing create snapshot [snap-frash4insd-kptw8sm1rew]
  1> [2019-09-09T13:42:03,824][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
  1> [2019-09-09T13:42:03,826][INFO ][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] --> checking for in progress snapshot...
  1> [2019-09-09T13:42:03,828][WARN ][o.e.s.SnapshotsService   ] [node_s0] [slm-repo][snap-frash4insd-kptw8sm1rew] failed to create snapshot
  1> org.elasticsearch.snapshots.ConcurrentSnapshotExecutionException: [slm-repo:snap-frash4insd-kptw8sm1rew]  a snapshot is already running
  1> 	at org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:301) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:697) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:319) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:214) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:834) [?:?]

My hunch is that the first snapshot has a "SUCCESS" status, but is still present in the cluster state. We should ensure it's no longer present in the cluster state before issuing the second execute policy request.

Metadata

Metadata

Assignees

Labels

:Data Management/ILM+SLMDO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead.>test-failureTriaged test failures from CI

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions