Cherry-pick #21258 to 7.9: o365input: Restart after fatal error#21386
Cherry-pick #21258 to 7.9: o365input: Restart after fatal error#21386adriansr merged 2 commits intoelastic:7.9from
Conversation
Update the o365input to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error. This enables the input to be more resilient against transient errors. Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes. (cherry picked from commit 8716d98)
|
Pinging @elastic/siem (Team:SIEM) |
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
|
CI failures unrelated |
elastic#21386) Update the o365input to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error. This enables the input to be more resilient against transient errors. Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes. (cherry picked from commit c723c1e)
Cherry-pick of PR #21258 to 7.9 branch. Original message:
What does this PR do?
Updates
o365inputto restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error.This enables the input to be more resilient against errors.
Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes.
Why is it important?
Some users are reporting that the
o365module stops ingesting events after some days. In all cases it's been observed that the input terminated at some point due to errors contacting the Azure authentication server to refresh a token.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Author's Checklist
How to test this PR locally
Testing the case of token refresh errors is difficult as they are refreshed once every ~12h. But the behavior can be tested by starting Filebeat without an internet connection.