handle EOF on single line content#33568
handle EOF on single line content#33568francescayeye merged 24 commits intoelastic:mainfrom francescayeye:issue-30436
Conversation
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
leehinman
left a comment
There was a problem hiding this comment.
I'd like some more unit tests. I'm specifically worried about the case where a partial event is written to the file, the reader reads that, then the rest of event is written to the file. The current code will give us one event, because it waits for the separator. I'd like verification that the proposed code does that as well.
hello @leehinman, totally agree with that! I thought that the bug could be handled only in the aws s3 input part (see 26eb38f) then I saw the test failing and realised that it's more complicated, I've moved the PR to draft basically we have to find how to handle this and this the problem is that in the second err check, when we have so we receive an empty message in the input anyway I'm thinking about changing the approach of the bugfix and do a check in the input if the content contains newline separator at all, if not, let's not attach the linereader, what do you think about it? it's safer but going to be more expensive in terms of performance. a trade off could be to fallback to a non linereader only if zero events were emitted |
I think the root of the problem is the current linereader is expecting data to be appended, so EOF doesn't mean the last data you could ever receive, just last for now. But in the case of s3 input, we actually have all the data, so EOF means the last data you could ever receive. Could we maybe just have a flag where the line reader will treat EOF like a line separator? That way it would work for data that has no line separators and ones where the first n-1 events have line separators but the nth one doesn't. I'm having a hard time coming up with a good heuristic for determining append vs non-append case, which is why I'm leaning towards a flag. |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
@leehinman please, look a the current solution if you are not satisfied I will add the flag for EOF my only concern in the last commit I've pushed is that I have to use a TeeReader, and that can be very expensive |
|
hi @leehinman , did you have the chance to review my changes? thanks |
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320)
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320)
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
* handle EOF on single line content (#33568) * handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) * changelog Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) # Conflicts: # libbeat/reader/readfile/line.go # libbeat/reader/readfile/line_test.go # x-pack/filebeat/input/awss3/s3_objects.go
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage
* handle EOF on single line content (#33568) * handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) # Conflicts: # libbeat/reader/readfile/line.go # libbeat/reader/readfile/line_test.go # x-pack/filebeat/input/awss3/s3_objects.go * Fix conflicts * Fix failing test - TestMaxBytesLimit * Fix #2 failing test - TestMaxBytesLimit * Fix failing test checks * Fix linter errors * Fix typo * Fix linter errors #2 * Fix linter errors #3 * Fix linter errors #4 * Fix linter errors #5 * Changelog clean up * Change order of publish event --------- Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> Co-authored-by: Tamara Dancheva <tamara.dancheva@elastic.co>
Bug
What does this PR do?
Closes #30436
Why is it important?
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs