Skip to content

Request circuit breaker value keeps increasing for percolate queries #144748

@NikolajLeischner

Description

@NikolajLeischner

Elasticsearch Version

9.3.2

Installed Plugins

No response

Java Version

bundled

OS Version

6.17.0-19-generic #19-Ubuntu

Problem Description

After updating an Elastic Cloud cluster from 9.3.1 to 9.3.2 we experience issues with the circuit breaker blocking all incoming requests. The request circuit breaker memory usage as reported by GET _cat/circuit_breaker/ goes up until whatever limit we configure, and then blocks every search.

The affected cluster is used exclusively for percolate searches. We do not experience the same issue with other clusters that do other kinds of searches, and it does appear to be only present in 9.3.2.

In our production setup, the issue appears within minutes of re-starting Elasticsearch. For now we seem to be able to workaround it by disabling the request circuit breaker with this configuration: indices.breaker.request.overhead: 0

Based on the relase notes, I guess this change might be related: #142150

This is the current output from GET /_cat/circuit_breaker/

492BWWYCSJC4UU-eOBz6VA request 21.6gb 38.6tb 0
492BWWYCSJC4UU-eOBz6VA inflight_requests 31gb 21.9mb 0
492BWWYCSJC4UU-eOBz6VA model_inference 15.5gb 0b 0
492BWWYCSJC4UU-eOBz6VA eql_sequence 15.5gb 0b 0
492BWWYCSJC4UU-eOBz6VA fielddata 18.5gb 248b 0
492BWWYCSJC4UU-eOBz6VA parent 30gb 20.6gb 0
o-PptF6tSf-5cSzRV9jkiQ inflight_requests 296mb 47b 0
o-PptF6tSf-5cSzRV9jkiQ request 207.1mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ fielddata 177.5mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ eql_sequence 148mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ model_inference 148mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ parent 287.1mb 150.5mb 0
NMpZPwVJRPK56d9i0IgR3Q eql_sequence 15.5gb 0b 0
NMpZPwVJRPK56d9i0IgR3Q model_inference 15.5gb 0b 0
NMpZPwVJRPK56d9i0IgR3Q inflight_requests 31gb 13.8mb 0
NMpZPwVJRPK56d9i0IgR3Q request 21.6gb 42.5tb 0
NMpZPwVJRPK56d9i0IgR3Q fielddata 18.5gb 720b 0
NMpZPwVJRPK56d9i0IgR3Q parent 30gb 6.3gb 0

Note that the values for the request circuit breaker keep increasing.

Actual memory usage on the cluster is not high (around 10% out of 64GiB on every node) and did not change with the update from 9.3.1 to 9.3.2.

Steps to Reproduce

I created a reproduction case in this git repository: https://github.com/NikolajLeischner/elasticsearch-9.3.2-circuit-breaker-bug

In the test case the "leak" is only ~50KiB per request, but with more documents in the index and query it blows up much faster.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions