Implement wanted error handling for alerting and actions#41917
Implement wanted error handling for alerting and actions#41917mikecote merged 13 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/kibana-stack-services |
This comment has been minimized.
This comment has been minimized.
88012ae to
d145f25
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cad9add to
13ae86a
Compare
This comment has been minimized.
This comment has been minimized.
x-pack/legacy/plugins/actions/server/action_type_registry.test.ts
Outdated
Show resolved
Hide resolved
| return error.retry == null ? true : error.retry; | ||
| } | ||
| // Retry other kinds of errors | ||
| return true; |
There was a problem hiding this comment.
I think we want to have this return false by default. Idea being that an action will know when something is retry-able - setting retry to true or a Date - and any other case, should not be retryable.
There was a problem hiding this comment.
Yeah I guess by looking here https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/server/lib/execute.ts#L29-L74 any code that could throw here is already being handled and wrapped into a proper ExecutionError. Some may need to return retry: false ex validation?
There was a problem hiding this comment.
I went ahead and made it false for other kinds of errors. I can't think of a reason to retry totally unhandled errors.
pmuellr
left a comment
There was a problem hiding this comment.
Made a few comments - just one in the review, a few "single" comments.
My only real concern is defaulting getRetry() to true in x-pack/legacy/plugins/actions/server/action_type_registry.ts; perhaps I'm not thinking about that right. And we could always tweak it later ...
| expect(runnerResult).toMatchInlineSnapshot(` | ||
| Object { | ||
| "runAt": 2019-06-03T18:55:30.982Z, | ||
| "runAt": 1970-01-01T00:00:10.000Z, |
There was a problem hiding this comment.
Interesting the date is set to such an old epoch-y date. My only thought is that at some point in the future, if we ended up needing to calculate a previous date from this, somehow, the epoch millis might go negative. Which node should handle, but I wonder if some other part of the system might choke on negative numbers.
$ node -p 'new Date(-1)'
1969-12-31T23:59:59.999ZOTOH, I guess would actually be a good test, so ... leave it in!?!?!?
There was a problem hiding this comment.
Yeah this seems to be what happens when you use fake timers, everything becomes new Date(0) by default 🤔
…rmat is returned with status: error
This comment has been minimized.
This comment has been minimized.
bmcconaghy
left a comment
There was a problem hiding this comment.
Code LGTM, just wondering about defaulting to retrying.
💚 Build Succeeded |
) * Implement wanted error handling * Cleanup * Add retry logic to actions * Leverage startedAt from task manager * Fix broken jest tests * Add missing unit test * Add unit tests for getRetry * Add test for rate limit * Remove fake timers * Don't retry errors by default for actions unless the proper result format is returned with status: error * Don't retry unless attribute specified * Fix tests
… (#42255) * Implement wanted error handling for alerting and actions (#41917) * Implement wanted error handling * Cleanup * Add retry logic to actions * Leverage startedAt from task manager * Fix broken jest tests * Add missing unit test * Add unit tests for getRetry * Add test for rate limit * Remove fake timers * Don't retry errors by default for actions unless the proper result format is returned with status: error * Don't retry unless attribute specified * Fix tests * Increase retry timeout to prevent flaky tests (#42291)
This PR implements what is wanted for error handling in alerting and actions as outlined in #39349.
The following changes have been made:
maxAttemptsattribute that is optionalretryattributescheduledRunAtandpreviousScheduledRunAt. Those are replaced withstartedAtandpreviousStartedAt.