Skip to content

[Filebeat] aws-s3 drops data when files do not end with EOL #30436

@andrewkroh

Description

@andrewkroh

Filebeat aws-s3 input should return the line of a file even if it does not end in EOL. It should flush any remaining bytes when it reaches the EOF even if they don't end in an EOL terminator. If the final line in a file does not end in an EOL then that data is dropped / lost. This does not impact the aws-s3 input when reading JSON because it uses its own streaming JSON reader.

To read log files the inputs uses readfile.LineReader. It was designed for log files that can be appended to so it waits for the EOL before flushing the log line. But with S3 the data should be considered immutable and the reader should flush any buffered data after io.EOF is returned.

Failing Test Case

(Apply this with git apply test-case.patch.)

diff --git a/x-pack/filebeat/input/awss3/s3_objects_test.go b/x-pack/filebeat/input/awss3/s3_objects_test.go
index 4ab3edfaa4..375ed35c84 100644
--- a/x-pack/filebeat/input/awss3/s3_objects_test.go
+++ b/x-pack/filebeat/input/awss3/s3_objects_test.go
@@ -216,6 +216,10 @@ func TestS3ObjectProcessor(t *testing.T) {
                err := s3ObjProc.Create(ctx, logp.NewLogger(inputName), ack, s3Event).ProcessS3Object()
                require.NoError(t, err)
        })
+
+       t.Run("text file without end of line marker", func(t *testing.T) {
+               testProcessS3Object(t, "testdata/no-eol.txt", "text/plain", 1)
+       })
 }
 
 func testProcessS3Object(t testing.TB, file, contentType string, numEvents int, selectors ...fileSelectorConfig) []beat.Event {
diff --git a/x-pack/filebeat/input/awss3/testdata/no-eol.txt b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
new file mode 100644
index 0000000000..0b7757db86
--- /dev/null
+++ b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
@@ -0,0 +1 @@
+This file does contain a final EOL.
\ No newline at end of file

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions