Make o365audit input cancellable#21647
Conversation
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination.
|
Pinging @elastic/siem (Team:SIEM) |
| ctx.Logger.Infof("Restarting in %v", failureRetryInterval) | ||
| time.Sleep(failureRetryInterval) | ||
| ctx.Logger.Infof("Restarting in %v", inp.config.API.ErrorRetryInterval) | ||
| timed.Wait(ctx.Cancelation, inp.config.API.ErrorRetryInterval) |
There was a problem hiding this comment.
Note: inputs might have recoverable and non-recoverable errors. Only non-recoverable errors must the input return. The later case is quite unlikely for most inputs I guess. In case we have this pattern more often we might want to unify it by providing some helpers, input manager wrapper, or simply add a 'setting' to the cursor.InputManager to always rerun on error.
|
Change LGTM. +1 for making the interval configurable. Does the poller error on every internal intermediate error? If not, how about naming it |
|
No, it only errors for authentication errors. Originally this was a fatal error so that the Beat wouldn't start with bad configuration, now it retries because the auth server can be occasionally down and there's no way to tell a transient error apart from a permanent authentication error. |
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit 1abe97b)
* upstream/master: (127 commits) Update obs app links (elastic#21682) fix: update fleet test suite name (elastic#21738) Remove dot from file.extension value in Auditbeat FIM (elastic#21644) Fix leaks with metadata processors (elastic#16349) Add istiod metricset (elastic#21519) [Ingest Manager] Change Sync/Close call order (elastic#21735) [Ingest Manager] Syncing unpacked files (elastic#21706) Fix concurrent map read and write in socket dataset (elastic#21690) Fix conditional coding to remove seccomp info from Winlogbeat (elastic#21652) [Elastic Agent] Fix issue where inputs without processors defined would panic (elastic#21628) Add configuration of filestream input (elastic#21565) libbeat/logp: introduce Logger.WithOptions (elastic#21671) Make o365audit input cancellable (elastic#21647) fix: remove extra curly brace in script (elastic#21692) [Winlogbeat] Remove brittle configuration validation from wineventlog (elastic#21593) Fix function that parses from/to/contact headers (elastic#21672) [CI] Support Windows-2016 in pipeline 2.0 (elastic#21337) Skip publisher flaky tests (elastic#21657) backport: add 7.10 branch (elastic#21635) [CI: Packaging] fix: push ubi8 images too (elastic#21621) ...
PR elastic#21258 introduced a restart mechanism for o365input so that it didn't stop working once a fatal error was found. This updates the restart delay to use a cancellation-context-aware method so that the input doesn't block Filebeat termination. (cherry picked from commit f2ab428)
What does this PR do?
o365inputto perform a cancellable wait when an error causes it to restart.error_retry_intervalfor the delay between restarts instead of a hardcoded5m.Why is it important?
Using
time.Sleepprevents Filebeat to terminate until the timeout is elapsed.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature works[ ] I have added an entry inCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Related issues
Relates #21258