stop throwing an IllegalStateException for unrecognized steps#32212
stop throwing an IllegalStateException for unrecognized steps#32212talevy merged 11 commits intoelastic:index-lifecyclefrom
Conversation
… runPolicy Since the reason for a step not being found in a registry may be due to staleness of the registry between it and the cluster state, we do not want to throw an IllegalStateException. Staleness is something that will be self-healing after follow-up applications of the cluster state updates, so this is a recoverable issue that should log a warning instead of throwing an exception Closes elastic#32181.
|
Pinging @elastic/es-core-infra |
| "current step for index [" + indexMetaData.getIndex().getName() + "] with policy [" + policy + "] is not recognized"); | ||
| // This may happen in the case that there is invalid ilm-step index settings or the stepRegistry is out of | ||
| // sync with the current cluster state | ||
| logger.warn("current step [" + getCurrentStepKey(indexSettings) + "] for index [" + indexMetaData.getIndex().getName() |
There was a problem hiding this comment.
Although I love that we want to provide the step info here, I have mixed feelings about the assertions that exist inside of getCurrentStepKey. I think this should be changed to throwing IllegalStateException. what do you think @colings86?
There was a problem hiding this comment.
I think the assertions are fine because ILM is in full control of the phase, action and step settings (they are INTERNAL settings so can't be touched by a user and will soon be moved to a custom index metadata object further locking down access to them. Therefore, I don't think its necessary to check that if one is set all three are set in production code but its useful to have the check in testing to make sure we don't do something silly so assertions feel right to me.
| @@ -64,8 +64,11 @@ public void runPolicy(String policy, IndexMetaData indexMetaData, ClusterState c | |||
| } | |||
| Step currentStep = getCurrentStep(stepRegistry, policy, indexSettings); | |||
There was a problem hiding this comment.
I looked closer at this. I missed something here.
This method calls PolicyStepsRegistry#getStep, which chooses to throw IllegalStateExceptions as well. Further investigation to see what the repercussions of changing that is necessary.
and tests need to be added to walk these paths as well. Current tests do not catch this
There was a problem hiding this comment.
In other words, should https://github.com/talevy/elasticsearch/blob/80ab7739ca8877e2036edfac724f6014fefb7e52/x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/indexlifecycle/PolicyStepsRegistry.java#L101-L114 continue to throw an IllegalStateException?
There was a problem hiding this comment.
I think we probably need to make PolicyStepsRegistry.getStep() return null if the step is missing (in much the same way that a map returns null for a missing key. Then we'll have to check for null:
- here in
IndexLifecycleRunner.runPolicy()and if itsnulllog a warning and return - In
IndexLifecycleRunner.moveClusterStateToStep()where I think we should do the same as above - In
ExecuteStepsUpdateTask.execute()where we should log the warning if itsnullbut return the cluster state up to changing the current step so we don't lose all the progress made on the previous cluster state steps
wdyt?
There was a problem hiding this comment.
that is what I was thinking as well. Just wanted to double check with you before I take that route
| "current step for index [" + indexMetaData.getIndex().getName() + "] with policy [" + policy + "] is not recognized"); | ||
| // This may happen in the case that there is invalid ilm-step index settings or the stepRegistry is out of | ||
| // sync with the current cluster state | ||
| logger.warn("current step [" + getCurrentStepKey(indexSettings) + "] for index [" + indexMetaData.getIndex().getName() |
There was a problem hiding this comment.
I think the assertions are fine because ILM is in full control of the phase, action and step settings (they are INTERNAL settings so can't be touched by a user and will soon be moved to a custom index metadata object further locking down access to them. Therefore, I don't think its necessary to check that if one is set all three are set in production code but its useful to have the check in testing to make sure we don't do something silly so assertions feel right to me.
| throw new IllegalArgumentException(e.getMessage()); | ||
| Step nextStep = stepRegistry.getStep(indexPolicySetting, nextStepKey); | ||
| if (nextStep == null) { | ||
| throw new IllegalArgumentException("step [" + nextStepKey + "] with policy [" + indexPolicySetting + "] does not exist"); |
There was a problem hiding this comment.
Could you explain why this still needs to throw an exception rather than just returning an unmodified cluster state? Also I know we threw an IllegalArgumentException before but it feels wrong to me since the user won't have supplied anything invalid
There was a problem hiding this comment.
I am cool with that.
I still feel like IllegalArgumentException should be thrown to users of TransportMoveToStepAction. So I will push up a basic exception for that, and it will know that nothing "right" happened since the clusterStates will be the same instance
| if (nextStep == null) { | ||
| throw new IllegalArgumentException("step [" + nextStepKey + "] with policy [" + indexPolicySetting + "] does not exist"); | ||
| // stepRegistry may not be up-to-date with latest policies/steps in cluster-state | ||
| return currentState; |
There was a problem hiding this comment.
Sorry, I had not looked closely enough at what this method is used for. It seems that its only used for the "move to step" and retry APIs so actually its probably ok for this method to throw an exception directly. I had thought before that we use this method for moving between steps normally.
I think we should throw directly here as the alternative of returning the same cluster state and then using that as an indication that the step is not recognised is a bit trappy. To ensure this method is kept only for API calls and not for normal execution maybe we can add a JavaDoc comment?
There was a problem hiding this comment.
OK. I will revert the latest commit and add comment
…ner, but throw in action" This reverts commit c009641.
|
thanks! |
Since the reason for a step not being found in a registry may be due to staleness of the registry between it and the cluster state, we do not want to throw an IllegalStateException. Staleness is something that will be self-healing after follow-up applications of the cluster state updates, so this is a recoverable issue that should log a warning instead of throwing an exception Closes #32181.
Since the reason for a step not being found in a registry may be due to staleness of the
registry between it and the cluster state, we do not want to throw an IllegalStateException.
Staleness is something that will be self-healing after follow-up applications of the cluster state
updates, so this is a recoverable issue that should log a warning instead of throwing an exception
Closes #32181.