Skip to content

handle EOF on single line content#33568

Merged
francescayeye merged 24 commits intoelastic:mainfrom
francescayeye:issue-30436
Nov 30, 2022
Merged

handle EOF on single line content#33568
francescayeye merged 24 commits intoelastic:mainfrom
francescayeye:issue-30436

Conversation

@francescayeye
Copy link
Copy Markdown

Bug

What does this PR do?

Closes #30436

Why is it important?

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@francescayeye francescayeye requested a review from a team as a code owner November 3, 2022 07:45
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 3, 2022
@francescayeye francescayeye self-assigned this Nov 3, 2022
@francescayeye francescayeye added Team:Cloud-Monitoring Label for the Cloud Monitoring team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 3, 2022
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Nov 3, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @aspacca? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Nov 3, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-30T03:12:03.372+0000

  • Duration: 138 min 22 sec

Test stats 🧪

Test Results
Failed 0
Passed 27570
Skipped 2127
Total 29697

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@francescayeye francescayeye requested a review from a team as a code owner November 3, 2022 13:48
@francescayeye francescayeye requested review from fearful-symmetry and leehinman and removed request for a team November 3, 2022 13:48
Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like some more unit tests. I'm specifically worried about the case where a partial event is written to the file, the reader reads that, then the rest of event is written to the file. The current code will give us one event, because it waits for the separator. I'd like verification that the proposed code does that as well.

@francescayeye francescayeye marked this pull request as draft November 4, 2022 03:19
@francescayeye
Copy link
Copy Markdown
Author

francescayeye commented Nov 4, 2022

I'd like some more unit tests. I'm specifically worried about the case where a partial event is written to the file, the reader reads that, then the rest of event is written to the file. The current code will give us one event, because it waits for the separator. I'd like verification that the proposed code does that as well.

hello @leehinman, totally agree with that!

I thought that the bug could be handled only in the aws s3 input part (see 26eb38f)

then I saw the test failing and realised that it's more complicated, I've moved the PR to draft

basically we have to find how to handle this and this

the problem is that in the second err check, when we have n = 0, err = io.EOF from the next iteration, we return from the loop, and we fail to collect the buffer here

so we receive an empty message in the input anyway

I'm thinking about changing the approach of the bugfix and do a check in the input if the content contains newline separator at all, if not, let's not attach the linereader, what do you think about it? it's safer but going to be more expensive in terms of performance. a trade off could be to fallback to a non linereader only if zero events were emitted

@leehinman
Copy link
Copy Markdown
Contributor

I'm thinking about changing the approach of the bugfix and do a check in the input if the content contains newline separator at all, if not, let's not attach the linereader, what do you think about it? it's safer but going to be more expensive in terms of performance. a trade off could be to fallback to a non linereader only if zero events were emitted

I think the root of the problem is the current linereader is expecting data to be appended, so EOF doesn't mean the last data you could ever receive, just last for now. But in the case of s3 input, we actually have all the data, so EOF means the last data you could ever receive.

Could we maybe just have a flag where the line reader will treat EOF like a line separator? That way it would work for data that has no line separators and ones where the first n-1 events have line separators but the nth one doesn't. I'm having a hard time coming up with a good heuristic for determining append vs non-append case, which is why I'm leaning towards a flag.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Nov 8, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b issue-30436 upstream/issue-30436
git merge upstream/main
git push upstream issue-30436

@francescayeye
Copy link
Copy Markdown
Author

@leehinman please, look a the current solution

if you are not satisfied I will add the flag for EOF

my only concern in the last commit I've pushed is that I have to use a TeeReader, and that can be very expensive

@francescayeye
Copy link
Copy Markdown
Author

hi @leehinman , did you have the chance to review my changes? thanks

Copy link
Copy Markdown
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@francescayeye francescayeye added backport-v8.5.0 Automated backport with mergify backport-v8.6.0 Automated backport with mergify labels Nov 30, 2022
@francescayeye francescayeye merged commit 7b45320 into elastic:main Nov 30, 2022
mergify bot pushed a commit that referenced this pull request Nov 30, 2022
* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)
mergify bot pushed a commit that referenced this pull request Nov 30, 2022
* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)
francescayeye pushed a commit that referenced this pull request Nov 30, 2022
* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
francescayeye pushed a commit that referenced this pull request Dec 1, 2022
* handle EOF on single line content (#33568)

* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)

* changelog

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
@zmoog zmoog added the backport-7.17 Automated backport to the 7.17 branch with mergify label Apr 1, 2023
mergify bot pushed a commit that referenced this pull request Apr 1, 2023
* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)

# Conflicts:
#	libbeat/reader/readfile/line.go
#	libbeat/reader/readfile/line_test.go
#	x-pack/filebeat/input/awss3/s3_objects.go
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage
tdancheva added a commit that referenced this pull request Jun 5, 2023
* handle EOF on single line content (#33568)

* handle EOF on single line content

* changelog

* fallback to encode_eof if no events in aws-s3 input

* lint

* lint

* collect on EOF in line reader

* remove encode eof

* remove iterN

* fix test

* increase test coverage

* linting

* more linting

* increase coverage

(cherry picked from commit 7b45320)

# Conflicts:
#	libbeat/reader/readfile/line.go
#	libbeat/reader/readfile/line_test.go
#	x-pack/filebeat/input/awss3/s3_objects.go

* Fix conflicts

* Fix failing test - TestMaxBytesLimit

* Fix #2 failing test - TestMaxBytesLimit

* Fix failing test checks

* Fix linter errors

* Fix typo

* Fix linter errors #2

* Fix linter errors #3

* Fix linter errors #4

* Fix linter errors #5

* Changelog clean up

* Change order of publish event

---------

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Co-authored-by: Tamara Dancheva <tamara.dancheva@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.5.0 Automated backport with mergify backport-v8.6.0 Automated backport with mergify Team:Cloud-Monitoring Label for the Cloud Monitoring team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Filebeat] aws-s3 drops data when files do not end with EOL

5 participants