[aws] feat: aws-s3 input registry cleanup for untracked s3 objects#41694
Conversation
|
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services) |
c129de6 to
f394447
Compare
kaiyan-sheng
left a comment
There was a problem hiding this comment.
Nice work @Kavindu-Dodan ! One quick question first, should we also introduce clean_removed parameter just to be matching the filestream input?
For s3 input, if |
42ae2ec to
fc03402
Compare
fc03402 to
9b29500
Compare
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
|
Looking at the change, if I've correctly understood, we're removing the IDs from the registry once the files have been read. |
The cleanup happens for objects we no longer see in the configured bucket and they won't be processed so there won't be any elastic acknowledgment. Any known bucket entry will be kept in the registry and will be tracked for their success or failure. See this logic [1] which is unchanged. [1] - beats/x-pack/filebeat/input/awss3/s3_input.go Lines 185 to 193 in 9b29500 |
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go
b17324a to
3c84e63
Compare
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345)
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go (cherry picked from commit 583d345) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
|
Hello @Kavindu-Dodan - Update: I see the concept of |
|
@lucabelluccini correct, there's no setting and clean-up occur by default when the object is no longer visible in the listing. |
Proposed commit message
This PR partially addresses #39116 by introducing a registry cleanup strategy for aws-s3 input.
The cleanup implemented here removes registry entries if the s3 object is no longer available (aka tracked) when listing inside the polling lookup. The cleanup removes objects that are not tracked from both the local state and internal store (backed by the registry) to reduce the memory usage.
Note that, this only benefits when s3 objects get removed (ex:- using lifecycle policy) and are no longer available. There should be a follow-up for instances where such removal is not done at the bucket. For example, this could be done by,
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.How to test this PR locally
Screenshots
Given below are pprof analyss comparisons for ~4K objects in registry and once they were clean up by removing S3 objects (emptying the bucket)
Related issues
Footnotes
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html ↩