Skip to content

aws-s3 input workers shouldn't wait for objects to be fully ingested before starting the next object #39414

@faec

Description

@faec

A major performance bottleneck in ingesting SQS/S3 data is that each input worker fetches an S3 object, then waits for it to be fully acknowledged upstream before fetching the next object. When individual objects are small this can block the pipeline, especially if queue.mem.flush.timeout is high: the output is waiting for more input data at the same time as the input is waiting for the output to fully process the current queued data.

Instead, workers should fetch and publish objects as fast as they're able to process them, and acknowledgments and cleanup should be handled asynchronously without blocking ingestion.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions