Request
The current log-ingestor implements buffer submission using a soft timeout mechanism. Specifically, whenever a new file is ingested, the timeout timer is reset. As a result, timeout-based buffer submission is only triggered when no new files are ingested within the configured interval.
However, user feedback indicates that this behavior is suboptimal in scenarios involving continuous ingestion of small files. In such cases:
- Size-based buffer submission is not triggered because individual files are too small to accumulate sufficient data.
- Timeout-based buffer submission is also not triggered, since each newly ingested file resets the timer.
This combination can lead to buffers remaining unsubmitted indefinitely under sustained low-volume ingestion.
After internal discussion, we propose switching to a hard timeout mechanism for buffer submission:
- The timeout timer is not reset upon new file ingestion.
- The timer is only reset after a buffer submission is triggered.
This approach ensures that buffers are flushed within a bounded time window, regardless of ingestion patterns, and better aligns with real-world workloads involving continuous streams of small files.
Possible implementation
As suggested above.
Request
The current log-ingestor implements buffer submission using a soft timeout mechanism. Specifically, whenever a new file is ingested, the timeout timer is reset. As a result, timeout-based buffer submission is only triggered when no new files are ingested within the configured interval.
However, user feedback indicates that this behavior is suboptimal in scenarios involving continuous ingestion of small files. In such cases:
This combination can lead to buffers remaining unsubmitted indefinitely under sustained low-volume ingestion.
After internal discussion, we propose switching to a hard timeout mechanism for buffer submission:
This approach ensures that buffers are flushed within a bounded time window, regardless of ingestion patterns, and better aligns with real-world workloads involving continuous streams of small files.
Possible implementation
As suggested above.