Describe the enhancement:
Currently, after seeing a 413 response from Elasticsearch the whole batch is dropped and the error is logged (#29368). Some of our customers would like to preserve at least some data from the batch instead of discarding the whole batch.
The proposal is:
- When seeing a 413 response from Elasticsearch try to split the current batch (maybe in 2, or based on size of each document in the batch)
- If the 413 response is seen again – repeat the process until:
- either all the smaller (split) batches are successfully sent to Elasticsearch or
- the initial batch is reduced to a single document that cannot pass the
http.max_content_length threshold in Elasticsearch
- If a batch contain only a single document that cannot be uploaded – drop the batch
Something similar was done in this PR logstash-plugins/logstash-output-elasticsearch#497
Please ensure that each of these actions are logged, in particular:
when the batch is dropped, please state in the log for info:
- inform the smaller batch was dropped
- how many iterations it took to reduce the size (if this is possible)
- Any info from the batch that was dropped (ideally if we know what application or integration)
When the batch is being cut to size:
- what is current size, and what is it being split into
- what is the current configured
max_bulk_size
Describe a specific use case for the enhancement or feature:
Some of our clients are more sensitive to data loss than others and this enhancement would allow to preserve more data in case of misconfiguration of http.max_content_length in Elasticsearch or bulk_max_size in beats. This would improve the situation in most of the cases but it would not completely solve the data loss problem.
Describe the enhancement:
Currently, after seeing a 413 response from Elasticsearch the whole batch is dropped and the error is logged (#29368). Some of our customers would like to preserve at least some data from the batch instead of discarding the whole batch.
The proposal is:
http.max_content_lengththreshold in ElasticsearchSomething similar was done in this PR logstash-plugins/logstash-output-elasticsearch#497
Please ensure that each of these actions are logged, in particular:
when the batch is dropped, please state in the log for info:
When the batch is being cut to size:
max_bulk_sizeDescribe a specific use case for the enhancement or feature:
Some of our clients are more sensitive to data loss than others and this enhancement would allow to preserve more data in case of misconfiguration of
http.max_content_lengthin Elasticsearch orbulk_max_sizein beats. This would improve the situation in most of the cases but it would not completely solve the data loss problem.