ILM integration test with full policy#33402
ILM integration test with full policy#33402dakrone merged 11 commits intoelastic:index-lifecyclefrom
Conversation
- this adds an integration test that runs through a policy with all the actions defined. - adds a test specific to a policy having just a rollover action - bumps the node count to 4
|
Pinging @elastic/es-core-infra |
colings86
left a comment
There was a problem hiding this comment.
I left a few minor comments but LGTM once those are fixed.
| new RolloverAction(null, null, 1L)))); | ||
| phases.put("warm", new Phase("warm", TimeValue.ZERO, warmActions)); | ||
| phases.put("cold", new Phase("cold", TimeValue.ZERO, singletonMap(AllocateAction.NAME, | ||
| new AllocateAction(1, singletonMap("_name", "node-3"), null, null)))); |
There was a problem hiding this comment.
The number of replicas needs to be set to 0 here otherwise the shrunken index can never progress past the cold phase meaning it will never be deleted. Setting this to 0 locally causes the test to pass for me
There was a problem hiding this comment.
oh, good catch! this is why I wanted another pair of eyes!
The error scenario is misleading due to the attempt to execute it again and the transaction was stuck halfway
There was a problem hiding this comment.
hmm. this does not pass the test locally for me. I will continue investigating. I am also seeing exceptions with rollover
rollover failed stacktrace
[2018-09-05T11:20:18,002][ERROR][o.e.c.s.MasterService ] [node-1] exception thrown by listener notifying of failure from [ILM]
org.elasticsearch.ElasticsearchException: policy [nzxPV] for index [hwdnjldqgi-000001] failed trying to move from step [{"phase":"hot","action":"rollover","name":"attempt_rollover"}] to step [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}].
at org.elasticsearch.xpack.indexlifecycle.MoveToNextStepUpdateTask.onFailure(MoveToNextStepUpdateTask.java:78) ~[?:?]
at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.onFailure(MasterService.java:453) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService$TaskOutputs.notifyFailedTasks(MasterService.java:386) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:199) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:133) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Suppressed: java.lang.NullPointerException
at org.elasticsearch.xpack.indexlifecycle.MoveToNextStepUpdateTask.execute(MoveToNextStepUpdateTask.java:54) ~[?:?]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:639) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:268) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:198) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:133) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: java.lang.NullPointerException
at org.elasticsearch.xpack.indexlifecycle.MoveToNextStepUpdateTask.execute(MoveToNextStepUpdateTask.java:54) ~[?:?]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:639) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:268) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:198) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
... 9 more
=========================================
There was a problem hiding this comment.
That stack trace, whilst something we should fix, I don't think will be the cause of any failure you are still seeing. That seems to be caused because we are trying to access the index metadata for an index that has been deleted in MoveToNextStepUpdateTask as we are not checking that the index metadata is not null before we try to use it. However, that should not be causing an issue, only an ugly and annoying NPE in the logs so I think there will be another stack trace from the test itself showing any remaining error that will cause the test to fail
There was a problem hiding this comment.
If I pull down this PR and rebase the branch on the latest from the feature branch I cannot reproduce the error you get above. However I can intermittently reproduce a failure with the following stack trace:
stacktrace
[2018-09-06T09:34:36,024][ERROR][o.e.ExceptionsHelper ] [node-3] fatal error
at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$2(ExceptionsHelper.java:264)
at java.base/java.util.Optional.ifPresent(Optional.java:172)
at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:254)
at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:201)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:844)
[2018-09-06T09:34:36,032][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-3] fatal error in thread [Thread-4], exiting
java.lang.AssertionError: expected all steps for [[pbifllksqt-000001/CO6HMKYZQG-ZcIIKXSaHAw]] to be in phase [new] but they were not, steps: [{"phase":"hot","action":"rollover","name":"attempt_rollover"} => {"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}, {"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"} => {"phase":"hot","action":"complete","name":"complete"}, {"phase":"hot","action":"complete","name":"complete"} => {"phase":"warm","action":"readonly","name":"readonly"}]
at org.elasticsearch.xpack.indexlifecycle.PolicyStepsRegistry.getStep(PolicyStepsRegistry.java:240) ~[?:?]
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunner.getCurrentStep(IndexLifecycleRunner.java:200) ~[?:?]
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunner.runPolicy(IndexLifecycleRunner.java:89) ~[?:?]
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggerPolicies(IndexLifecycleService.java:207) ~[?:?]
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.triggered(IndexLifecycleService.java:165) ~[?:?]
at org.elasticsearch.xpack.core.scheduler.SchedulerEngine.notifyListeners(SchedulerEngine.java:164) ~[?:?]
at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:192) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
This error seems to shoot the node
There was a problem hiding this comment.
I'm looking into this to see if I can reproduce it
| "{ \"policy\":" + Strings.toString(builder) + "}", ContentType.APPLICATION_JSON); | ||
| Request request = new Request("PUT", "_ilm/" + policy); | ||
| request.setEntity(entity); | ||
| client().performRequest(request); |
There was a problem hiding this comment.
We should check the response here with something like assertOK()
|
|
||
| public static void updatePolicy(String indexName, String policy) throws IOException { | ||
| Request request = new Request("PUT", "/" + indexName + "/_ilm/" + policy); | ||
| client().performRequest(request); |
There was a problem hiding this comment.
We should check the response here, probably with assertOK()
|
We discussed some of the step pile-up problems that are causing these tests to be flaky in a video-call. I have added the |
- this adds an integration test that runs through a policy with all the actions defined. - adds a test specific to a policy having just a rollover action - bumps the node count to 4
with all the actions defined.
NOTE: test fails, and I think it is due to timing of async actions being executed in parallel