Filebeat aws-s3 input should return the line of a file even if it does not end in EOL. It should flush any remaining bytes when it reaches the EOF even if they don't end in an EOL terminator. If the final line in a file does not end in an EOL then that data is dropped / lost. This does not impact the aws-s3 input when reading JSON because it uses its own streaming JSON reader.
To read log files the inputs uses readfile.LineReader. It was designed for log files that can be appended to so it waits for the EOL before flushing the log line. But with S3 the data should be considered immutable and the reader should flush any buffered data after io.EOF is returned.
Failing Test Case
(Apply this with git apply test-case.patch.)
diff --git a/x-pack/filebeat/input/awss3/s3_objects_test.go b/x-pack/filebeat/input/awss3/s3_objects_test.go
index 4ab3edfaa4..375ed35c84 100644
--- a/x-pack/filebeat/input/awss3/s3_objects_test.go
+++ b/x-pack/filebeat/input/awss3/s3_objects_test.go
@@ -216,6 +216,10 @@ func TestS3ObjectProcessor(t *testing.T) {
err := s3ObjProc.Create(ctx, logp.NewLogger(inputName), ack, s3Event).ProcessS3Object()
require.NoError(t, err)
})
+
+ t.Run("text file without end of line marker", func(t *testing.T) {
+ testProcessS3Object(t, "testdata/no-eol.txt", "text/plain", 1)
+ })
}
func testProcessS3Object(t testing.TB, file, contentType string, numEvents int, selectors ...fileSelectorConfig) []beat.Event {
diff --git a/x-pack/filebeat/input/awss3/testdata/no-eol.txt b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
new file mode 100644
index 0000000000..0b7757db86
--- /dev/null
+++ b/x-pack/filebeat/input/awss3/testdata/no-eol.txt
@@ -0,0 +1 @@
+This file does contain a final EOL.
\ No newline at end of file
Filebeat aws-s3 input should return the line of a file even if it does not end in EOL. It should flush any remaining bytes when it reaches the EOF even if they don't end in an EOL terminator. If the final line in a file does not end in an EOL then that data is dropped / lost. This does not impact the aws-s3 input when reading JSON because it uses its own streaming JSON reader.
To read log files the inputs uses readfile.LineReader. It was designed for log files that can be appended to so it waits for the EOL before flushing the log line. But with S3 the data should be considered immutable and the reader should flush any buffered data after
io.EOFis returned.Failing Test Case
(Apply this with
git apply test-case.patch.)