When scanning an S3 bucket, metadata from each object is saved to the registry (including whether it has been successfully downloaded). Each object's metadata consumes approximately 1KB of space in the registry.
The intention in the code was for this metadata to be deleted after a bucket scan, but this deletion was implemented incorrectly (see also #39065), so most S3 object metadata is persisted forever and never cleaned up. This accumulates even after objects have been removed from the original bucket, or the target bucket has been changed, so that the input adds ~1GB to the registry for every million objects it has ever seen across all time and all buckets. These objects are also stored in memory during Filebeat execution and can significantly increase memory requirements on large buckets.
When scanning an S3 bucket, metadata from each object is saved to the registry (including whether it has been successfully downloaded). Each object's metadata consumes approximately 1KB of space in the registry.
The intention in the code was for this metadata to be deleted after a bucket scan, but this deletion was implemented incorrectly (see also #39065), so most S3 object metadata is persisted forever and never cleaned up. This accumulates even after objects have been removed from the original bucket, or the target bucket has been changed, so that the input adds ~1GB to the registry for every million objects it has ever seen across all time and all buckets. These objects are also stored in memory during Filebeat execution and can significantly increase memory requirements on large buckets.