Skip to content

Split large batches of documents if received 413 from Elasticsearch #29778

@rdner

Description

@rdner

Describe the enhancement:

Currently, after seeing a 413 response from Elasticsearch the whole batch is dropped and the error is logged (#29368). Some of our customers would like to preserve at least some data from the batch instead of discarding the whole batch.

The proposal is:

  • When seeing a 413 response from Elasticsearch try to split the current batch (maybe in 2, or based on size of each document in the batch)
  • If the 413 response is seen again – repeat the process until:
    • either all the smaller (split) batches are successfully sent to Elasticsearch or
    • the initial batch is reduced to a single document that cannot pass the http.max_content_length threshold in Elasticsearch
  • If a batch contain only a single document that cannot be uploaded – drop the batch

Something similar was done in this PR logstash-plugins/logstash-output-elasticsearch#497

Please ensure that each of these actions are logged, in particular:

when the batch is dropped, please state in the log for info:

  • inform the smaller batch was dropped
  • how many iterations it took to reduce the size (if this is possible)
  • Any info from the batch that was dropped (ideally if we know what application or integration)

When the batch is being cut to size:

  • what is current size, and what is it being split into
  • what is the current configured max_bulk_size

Describe a specific use case for the enhancement or feature:

Some of our clients are more sensitive to data loss than others and this enhancement would allow to preserve more data in case of misconfiguration of http.max_content_length in Elasticsearch or bulk_max_size in beats. This would improve the situation in most of the cases but it would not completely solve the data loss problem.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions