A major performance bottleneck in ingesting SQS/S3 data is that each input worker fetches an S3 object, then waits for it to be fully acknowledged upstream before fetching the next object. When individual objects are small this can block the pipeline, especially if queue.mem.flush.timeout is high: the output is waiting for more input data at the same time as the input is waiting for the output to fully process the current queued data.
Instead, workers should fetch and publish objects as fast as they're able to process them, and acknowledgments and cleanup should be handled asynchronously without blocking ingestion.
A major performance bottleneck in ingesting SQS/S3 data is that each input worker fetches an S3 object, then waits for it to be fully acknowledged upstream before fetching the next object. When individual objects are small this can block the pipeline, especially if
queue.mem.flush.timeoutis high: the output is waiting for more input data at the same time as the input is waiting for the output to fully process the current queued data.Instead, workers should fetch and publish objects as fast as they're able to process them, and acknowledgments and cleanup should be handled asynchronously without blocking ingestion.