Skip to content

Implement wanted error handling for alerting and actions#41917

Merged
mikecote merged 13 commits intoelastic:masterfrom
mikecote:alerting/error-handling
Jul 30, 2019
Merged

Implement wanted error handling for alerting and actions#41917
mikecote merged 13 commits intoelastic:masterfrom
mikecote:alerting/error-handling

Conversation

@mikecote
Copy link
Copy Markdown
Contributor

@mikecote mikecote commented Jul 24, 2019

This PR implements what is wanted for error handling in alerting and actions as outlined in #39349.

The following changes have been made:

  • Action types have a new maxAttempts attribute that is optional
  • Executions returning status of error will be able to recommend retry logic via retry attribute
  • Executors are no longer passed scheduledRunAt and previousScheduledRunAt. Those are replaced with startedAt and previousStartedAt.

@mikecote mikecote self-assigned this Jul 24, 2019
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-stack-services

@elasticmachine

This comment has been minimized.

@mikecote mikecote force-pushed the alerting/error-handling branch 3 times, most recently from 88012ae to d145f25 Compare July 26, 2019 16:47
@elasticmachine

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@mikecote mikecote force-pushed the alerting/error-handling branch from cad9add to 13ae86a Compare July 29, 2019 12:46
@mikecote mikecote marked this pull request as ready for review July 29, 2019 13:28
@mikecote mikecote requested a review from a team July 29, 2019 13:28
@elasticmachine

This comment has been minimized.

return error.retry == null ? true : error.retry;
}
// Retry other kinds of errors
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to have this return false by default. Idea being that an action will know when something is retry-able - setting retry to true or a Date - and any other case, should not be retryable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess by looking here https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/server/lib/execute.ts#L29-L74 any code that could throw here is already being handled and wrapped into a proper ExecutionError. Some may need to return retry: false ex validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and made it false for other kinds of errors. I can't think of a reason to retry totally unhandled errors.

Copy link
Copy Markdown
Contributor

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments - just one in the review, a few "single" comments.

My only real concern is defaulting getRetry() to true in x-pack/legacy/plugins/actions/server/action_type_registry.ts; perhaps I'm not thinking about that right. And we could always tweak it later ...

expect(runnerResult).toMatchInlineSnapshot(`
Object {
"runAt": 2019-06-03T18:55:30.982Z,
"runAt": 1970-01-01T00:00:10.000Z,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting the date is set to such an old epoch-y date. My only thought is that at some point in the future, if we ended up needing to calculate a previous date from this, somehow, the epoch millis might go negative. Which node should handle, but I wonder if some other part of the system might choke on negative numbers.

$ node -p 'new Date(-1)'
1969-12-31T23:59:59.999Z

OTOH, I guess would actually be a good test, so ... leave it in!?!?!?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems to be what happens when you use fake timers, everything becomes new Date(0) by default 🤔

@elasticmachine

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@bmcconaghy bmcconaghy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, just wondering about defaulting to retrying.

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@mikecote mikecote merged commit 9a321ae into elastic:master Jul 30, 2019
mikecote added a commit to mikecote/kibana that referenced this pull request Jul 30, 2019
)

* Implement wanted error handling

* Cleanup

* Add retry logic to actions

* Leverage startedAt from task manager

* Fix broken jest tests

* Add missing unit test

* Add unit tests for getRetry

* Add test for rate limit

* Remove fake timers

* Don't retry errors by default for actions unless the proper result format is returned with status: error

* Don't retry unless attribute specified

* Fix tests
mikecote added a commit that referenced this pull request Jul 30, 2019
… (#42255)

*  Implement wanted error handling for alerting and actions (#41917)

* Implement wanted error handling

* Cleanup

* Add retry logic to actions

* Leverage startedAt from task manager

* Fix broken jest tests

* Add missing unit test

* Add unit tests for getRetry

* Add test for rate limit

* Remove fake timers

* Don't retry errors by default for actions unless the proper result format is returned with status: error

* Don't retry unless attribute specified

* Fix tests

* Increase retry timeout to prevent flaky tests (#42291)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes review v7.4.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants