When polling an S3 bucket, the aws-s3 input keeps a sync timestamp of the last time that has been fully ingested. Objects with creation timestamps before that will be skipped when polling the bucket.
The sync timestamp is updated in (*s3Poller).Purge, which handles cleanup after a full scan of the bucket. Purge advances the timestamp based on the creation time of the objects ingested during the scan. However, if the scan encounters too many ephemeral errors (rate-limit warnings, network instability) it will be restarted and Purge will be called early. In this case, it will advance the sync timestamp based on the objects that were processed, even though there may still be older objects later in the scan that were skipped. This can result in a dramatic slowdown and eventual stop of ingestion, as an increasing number of objects are skipped on each pass.
A few things mask the severity:
- The way the sync timestamp is stored means it often doesn't persist between Filebeat restarts.
- The object timestamp collected on each page during the scan isn't the most recent timestamp, it's just some timestamp ahead of the current time, so the rate of slowdown is less severe than it could have been.
- It is relatively uncommon for the error state to be triggered on small buckets with modest ingestion speed, so when slowdown is observed it's attributed to bucket size
When polling an S3 bucket, the
aws-s3input keeps a sync timestamp of the last time that has been fully ingested. Objects with creation timestamps before that will be skipped when polling the bucket.The sync timestamp is updated in
(*s3Poller).Purge, which handles cleanup after a full scan of the bucket.Purgeadvances the timestamp based on the creation time of the objects ingested during the scan. However, if the scan encounters too many ephemeral errors (rate-limit warnings, network instability) it will be restarted andPurgewill be called early. In this case, it will advance the sync timestamp based on the objects that were processed, even though there may still be older objects later in the scan that were skipped. This can result in a dramatic slowdown and eventual stop of ingestion, as an increasing number of objects are skipped on each pass.A few things mask the severity: