Conversation
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead.
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
|
@cmacknz there will be no 8.6 releases anymore, I don't think we need that backport label. |
leehinman
left a comment
There was a problem hiding this comment.
code LGTM.
One optional nit on the name SafeProcessor. The name doesn't tell me what it is safe from, so if you just encountered this in registry.go you wouldn't have many clues as to why the constructor is being wrapped. Names like CloseOnce or AtomicCloser might be better.
|
@leehinman the name comes from the fact that it's not just addressing the I've put a lot of thought into the naming and I'd rather keep it this way, so we could perhaps extend the functionality of this wrapper in the future, e.g. blocking certain events from being processed by any processor, etc. |
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58)
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58)
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58)
* Add `SafeProcessor` wrapper (#34647) Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58) * Remove extra changelog entries --------- Co-authored-by: Denis <denis.rechkunov@elastic.co>
* Add `SafeProcessor` wrapper (#34647) Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58) * Remove extra changelog entries * Fix different constructor signature --------- Co-authored-by: Denis <denis.rechkunov@elastic.co>
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead. (cherry picked from commit 5007b58) Co-authored-by: Denis <denis.rechkunov@elastic.co>
After introducing the `SafeProcessor` wrapper in elastic#34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart.
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c)
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c)
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c) # Conflicts: # filebeat/channel/runner.go # libbeat/processors/safe_processor.go
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c) Co-authored-by: Denis <denis.rechkunov@elastic.co>
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c) Co-authored-by: Denis <denis.rechkunov@elastic.co>
#34764) * Stop re-using processors defined in the config (#34761) * Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry (cherry picked from commit 5cfe62c) # Conflicts: # filebeat/channel/runner.go # libbeat/processors/safe_processor.go * Resolve conflicts, fix changelog * Add new line to changelog * Revert comment auto-formatting --------- Co-authored-by: Denis <denis.rechkunov@elastic.co>
Each processor might end up in multiple processor groups. Every group has its own `Close` that calls `Close` on each processor of that group which leads to multiple `Close` calls on the same processor. If the `SafeProcessor` wrapper was not used, the processor would have to handle multiple `Close` calls with adding `sync.Once` in its `Close` function. We make it easer for processor developers and take care of it in the processor registry instead.
* Stop re-using processors defined in the config After introducing the `SafeProcessor` wrapper in #34647 we started returning errors when a processor is being used after its `Close` function has been called. This led to dropped events and error spam in logs but also confirmed that the root cause of the problem was not just a race condition on `Close` but re-used processors somewhere. After a long investigation such code that's re-using processors was finally found. This is the change that removes re-using the processors and instantiates them on each input restart. * Fix linter issues * Add changelog entry
What does this PR do?
Each processor might end up in multiple processor groups. Every group has its own
Closethat callsCloseon each processor of that group which leads to multipleClosecalls on the same processor.If the
SafeProcessorwrapper was not used,the processor would have to handle multiple
Closecalls with addingsync.Oncein itsClosefunction.We make it easer for processor developers and take care of it in the processor registry instead.
Why is it important?
Our customers could see panics in logs which makes it noisy.
Also, there are reports of higher impact:
Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.How to test this PR locally
Unfortunately, it's very difficult to reproduce the panic described in the issue, it seems to be a result of some kind of race condition on Beats shutting down.
The change is covered with unit tests and I made sure that a simple processor with the following config still works and there is no regression:
and I got this event published:
{ "@timestamp": "2023-02-23T10:33:22.673Z", "@metadata": { "beat": "filebeat", "type": "_doc", "version": "8.8.0" }, "log": { "offset": 32170800, "file": { "path": "metadata-panic/logs/log1.ndjson" } }, "message": "{\"data\": \"sample\"}", "input": { "type": "filestream" }, "project": { "name": "myproject", "id": "UNIQUE_VALUE" }, "agent": { "ephemeral_id": "926e973d-f35b-4f1e-95aa-6b9cf239e21c", "id": "2f6fa2ce-1a37-4259-91e8-1df581639d07", "name": "MacBook-Pro.localdomain", "type": "filebeat", "version": "8.8.0" }, "ecs": { "version": "8.0.0" }, "host": { "name": "MacBook-Pro.localdomain" } }Note
Related issues