Skip to content

Don't halt policy execution on policy trigger exception#49128

Merged
dakrone merged 1 commit intomasterfrom
ilm-dont-halt-on-policy-error
Nov 15, 2019
Merged

Don't halt policy execution on policy trigger exception#49128
dakrone merged 1 commit intomasterfrom
ilm-dont-halt-on-policy-error

Conversation

@dakrone
Copy link
Copy Markdown
Member

@dakrone dakrone commented Nov 15, 2019

When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
maybeRunAsyncAction, runPolicyAfterStateChange, or
runPeriodicStep throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.

When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
`maybeRunAsyncAction`, `runPolicyAfterStateChange`, or
`runPeriodicStep` throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.
@dakrone dakrone added >bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. v8.0.0 v7.5.0 v7.6.0 v7.4.3 labels Nov 15, 2019
@dakrone dakrone requested a review from andreidan November 15, 2019 00:17
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@dakrone
Copy link
Copy Markdown
Member Author

dakrone commented Nov 15, 2019

I re-opened #37581 (comment) for the failure (it is unrelated to this PR)

@elasticmachine run elasticsearch-ci/1

Copy link
Copy Markdown
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - great catch Lee!

@dakrone dakrone merged commit 74a9407 into master Nov 15, 2019
@dakrone dakrone deleted the ilm-dont-halt-on-policy-error branch November 15, 2019 14:57
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Nov 15, 2019
When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
`maybeRunAsyncAction`, `runPolicyAfterStateChange`, or
`runPeriodicStep` throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Nov 15, 2019
When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
`maybeRunAsyncAction`, `runPolicyAfterStateChange`, or
`runPeriodicStep` throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Nov 15, 2019
When triggered either by becoming master, a new cluster state, or a
periodic schedule, an ILM policy execution through
`maybeRunAsyncAction`, `runPolicyAfterStateChange`, or
`runPeriodicStep` throwing an exception will cause the loop the
terminate. This means that any indices that would have been processed
after the index where the exception was thrown will not be processed by
ILM.

For most execution this is not a problem because the actual running of
steps is protected by a try/catch that moves the index to the ERROR step
in the event of a problem. If an exception occurs prior to step
execution (for example, in fetching and parsing the current
policy/step) however, it causes the loop termination previously
mentioned.

This commit wraps the invocation of the methods specified above in a
try/catch block that provides better logging and does not bubble the
exception up.
dakrone added a commit that referenced this pull request Nov 15, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to #49128
andreidan pushed a commit that referenced this pull request Nov 19, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to #49128
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Nov 19, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to elastic#49128

(cherry picked from commit 72530f8)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Nov 19, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to elastic#49128

(cherry picked from commit 72530f8)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan added a commit that referenced this pull request Nov 19, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to #49128

(cherry picked from commit 72530f8)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan added a commit that referenced this pull request Nov 19, 2019
This commit wraps the calls to retrieve the current step in a try/catch
so that the exception does not bubble up. Instead, step info is added
containing the exception to the existing step.

Semi-related to #49128

(cherry picked from commit 72530f8)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. v7.4.3 v7.5.0 v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants