[7.x](backport #27199) [Filebeat] Refactor AWS S3 input with workers#27338
Merged
andrewkroh merged 1 commit into7.xfrom Aug 13, 2021
Merged
[7.x](backport #27199) [Filebeat] Refactor AWS S3 input with workers#27338andrewkroh merged 1 commit into7.xfrom
andrewkroh merged 1 commit into7.xfrom
Conversation
Contributor
|
Pinging @elastic/integrations (Team:Integrations) |
Contributor
|
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
Contributor
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
* Refactor AWS S3 input with workers
This changes the AWS S3 input to allow it to process more SQS messages in parallel
by having workers that are fully utilized while there are SQS message to process.
The previous design processed SQS messages in batches ranging from 1 to 10 in size.
It waited until all messages were processed before requesting more. This left some
workers idle toward the the end of processing the batch. This also limited the maximum
number of messages processed in parallel to 10 because that is the largest request
size allowed by SQS.
The refactored input uses ephemeral goroutines as workers to process SQS messages. It
receives as many SQS messages as there are free workers. The total number of workers
is controlled by `max_number_of_messages` (same as before but without an upper limit).
Other changes
Prevent poison pill messages
When an S3 object processing error occurs the SQS message is returned to the after
the visibility timeout expires. This allows it to be reprocessed again or moved to
the SQS dead letter queue (if configured). But if no dead letter queue policy is
configured and the error is permanent (reprocessing won't fix it) then the message
would continuosly be reprocessed. On error the input will now check the
`ApproximateReceiveCount` attribute of the SQS message and delete it if it exceeds
the configured maximum retries.
Removal of api_timeout from S3 GetObject calls
The `api_timeout` has been removed from the S3 `GetObject` call. This limited the
maximum amount of time for processing the object since the response body is processed
as a stream while the request is open. Requests can still timeout in the server due
to inactivity.
Improved debug logs
The log messages have been enriched with more data about the related SQS message and
S3 object. For example the SQS `message_id`, `s3_bucket`, and `s3_object` are
included in some messages.
`DEBUG [aws-s3.sqs_s3_event] awss3/s3.go:127 End S3 object processing. {"id": "test_id", "queue_url": "https://sqs.us-east-1.amazonaws.com/144492464627/filebeat-s3-integtest-lxlmx6", "message_id": "a11de9f9-0a68-4c4e-a09d-979b87602958", "s3_bucket": "filebeat-s3-integtest-lxlmx6", "s3_object": "events-array.json", "elapsed_time_ns": 23262327}`
Increased test coverage
The refactored input has about 88% test coverage.
The specific AWS API methods used by the input were turned into interfaces to allow
for easier testing. The unit tests mock the AWS interfaces.
The parts of the input were separted into three components listed below. There's a defined
interface for each to allow for mock testing there too. To test the interactions between
these components go-mock is used to generate mocks and then assert the expectations.
1. The SQS receiver. (sqs.go)
2. The S3 Notification Event handler. (sqs_s3_event.go)
3. The S3 Object reader. (s3.go)
Terraform setup for integration test
Setup for executing the integration tests is now handled by Terraform.
See _meta/terraform/README.md for instructions.
Benchmark test
I added a benchmark that tests the input in isolation with mocked SQS and S3 responeses.
It uses a 7KB cloudtrail json.gz file containing about 60 messages for its input.
This removes any variability related to the network, but also means these do not reflect
real-work rates. They can be used to measure the effect of future changes.
+-------------------+--------------------+------------------+--------------------+------+
| MAX MSGS INFLIGHT | EVENTS PER SEC | S3 BYTES PER SEC | TIME (SEC) | CPUS |
+-------------------+--------------------+------------------+--------------------+------+
| 1 | 23019.782175720782 | 3.0 MB | 1.257266458 | 12 |
| 2 | 36237.53174269319 | 4.8 MB | 1.158798571 | 12 |
| 4 | 56456.84532752983 | 7.5 MB | 1.138285351 | 12 |
| 8 | 90485.15755430676 | 12 MB | 1.117244007 | 12 |
| 16 | 103853.8984324643 | 14 MB | 1.165541225 | 12 |
| 32 | 110380.28141417276 | 15 MB | 1.110814345 | 12 |
| 64 | 116074.13735061679 | 15 MB | 1.408100062 | 12 |
| 128 | 114854.80273666105 | 15 MB | 1.5331357140000001 | 12 |
| 256 | 118767.73924992209 | 16 MB | 2.041783413 | 12 |
| 512 | 122933.1033660647 | 16 MB | 1.255463303 | 12 |
| 1024 | 124222.51861746894 | 17 MB | 1.505765638 | 12 |
+-------------------+--------------------+------------------+--------------------+------+
Relates #25750
* Use InitializeAWSConfig
* Add s3Lister interface for mocking pagination of S3 ListObjects calls
* Add new config parameters to reference.yml
* Optimize uploading b/c it was slow in aws v2 sdk
6b41f01 to
39b0e75
Compare
francescayeye
approved these changes
Aug 13, 2021
Contributor
Author
|
This pull request is now in conflicts. Could you fix it? 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an automatic backport of pull request #27199 done by Mergify.
Cherry-pick of 7c76158 has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Mergify commands and options
More conditions and actions can be found in the documentation.
You can also trigger Mergify actions by commenting on this pull request:
@Mergifyio refreshwill re-evaluate the rules@Mergifyio rebasewill rebase this PR on its base branch@Mergifyio updatewill merge the base branch into this PR@Mergifyio backport <destination>will backport this PR on<destination>branchAdditionally, on Mergify dashboard you can:
Finally, you can contact us on https://mergify.io/