Do not close filestream harvester if an unexpected error is returned when close.on_state_change.* is enabled#26411
Conversation
|
Pinging @elastic/agent (Team:Agent) |
|
This pull request doesn't have a |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
…am input with close_* options
7973b7b to
8ab0707
Compare
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
| } | ||
|
|
||
| // If an unexpected error happens we keep the reader open hoping once everything will go back to normal. | ||
| f.log.Errorf("Unexpected error reading from %s; error: %s", f.file.Name(), statErr) |
There was a problem hiding this comment.
This one still might produce a false-positive log message. The shouldBeClosed method is called by a separate go-routine independent from the reader go-routine. When the reader go-routine shuts down (cancelled event) while shouldBeClosed is run we will see an error telling us that f is invalid because it was already closed.
In order to prevent the false-positive message, we either need a mutex or some ref-counting on f + a wrapper on the reader that allows us to unblock a pending read call if the file gets closed async.
There was a problem hiding this comment.
Let's try to solve the race in a separate PR.
…when close.on_state_change.* is enabled (#26411) ## What does this PR do? This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running. ## Why is it important? Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped. (cherry picked from commit 1106449)
…when close.on_state_change.* is enabled (#26411) ## What does this PR do? This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running. ## Why is it important? Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped. (cherry picked from commit 1106449)
…when close.on_state_change.* is enabled (#26411) (#26477) ## What does this PR do? This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running. ## Why is it important? Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped. (cherry picked from commit 1106449) Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
…when close.on_state_change.* is enabled (#26411) (#26476) ## What does this PR do? This PR returns early if `close.on_state_change.removed` is enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running. ## Why is it important? Previously, a message has been logged on error level and the reader has been stopped if the `Stat` call returned an error. However, it was not correct because if `close.on_state_change.renamed` was enabled the reader would have been closed if the file had been removed. Now the reader is not stopped. (cherry picked from commit 1106449) Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
* master: (32 commits) [Metricbeat] Change Account ID to Project ID in `gcp.billing` module (elastic#26412) update libbeat fields.ecs.yml file and ecsVersion to 1.10.0 (elastic#26121) [Filebeat] Update AWS ELB ingest pipeline (elastic#26441) [FIlebeat] add strict_date_optional_time_nanos date format to PanOS module (elastic#26158) Fix the irregular and typo on prometheus module. (elastic#25726) [Filebeat] Parse additonal debug data fields for Okta module (elastic#25818) fix: update MSSQL Server linux image's Docker registry (elastic#26440) Update indexing.go godocs (elastic#26408) Do not close filestream harvester if an unexpected error is returned when close.on_state_change.* is enabled (elastic#26411) Add support for copytruncate method when rotating input logs with an external tool in `filestream` input (elastic#23457) Allow fields with ip_range datatype (elastic#26444) Add Anomali ThreatStream support to threatintel module (elastic#26350) fix: use the right param type (elastic#26469) [Automation] Update elastic stack version to 8.0.0-7640093f for testing (elastic#26460) Set SM Filebeat modules as GA (elastic#26226) Fix rfc5464 date parsing in the syslog input (elastic#26419) Add linked account information into billing metricset (elastic#26285) [Filebeat] Update HA Proxy log grok patterns (elastic#25835) disable metricbeat logstash test_node_stats (elastic#26436) chore: pass BEAT_VERSION when running E2E tests (elastic#26291) ...
What does this PR do?
This PR returns early if
close.on_state_change.removedis enabled and the opened file no longer exists. Otherwise, it logs an error message and keeps the reader running.Why is it important?
Previously, a message has been logged on error level and the reader has been stopped if the
Statcall returned an error. However, it was not correct because ifclose.on_state_change.renamedwas enabled the reader would have been closed if the file had been removed. Now the reader is not stopped.Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.