Elasticsearch Version
9.3.2
Installed Plugins
No response
Java Version
bundled
OS Version
6.17.0-19-generic #19-Ubuntu
Problem Description
After updating an Elastic Cloud cluster from 9.3.1 to 9.3.2 we experience issues with the circuit breaker blocking all incoming requests. The request circuit breaker memory usage as reported by GET _cat/circuit_breaker/ goes up until whatever limit we configure, and then blocks every search.
The affected cluster is used exclusively for percolate searches. We do not experience the same issue with other clusters that do other kinds of searches, and it does appear to be only present in 9.3.2.
In our production setup, the issue appears within minutes of re-starting Elasticsearch. For now we seem to be able to workaround it by disabling the request circuit breaker with this configuration: indices.breaker.request.overhead: 0
Based on the relase notes, I guess this change might be related: #142150
This is the current output from GET /_cat/circuit_breaker/
492BWWYCSJC4UU-eOBz6VA request 21.6gb 38.6tb 0
492BWWYCSJC4UU-eOBz6VA inflight_requests 31gb 21.9mb 0
492BWWYCSJC4UU-eOBz6VA model_inference 15.5gb 0b 0
492BWWYCSJC4UU-eOBz6VA eql_sequence 15.5gb 0b 0
492BWWYCSJC4UU-eOBz6VA fielddata 18.5gb 248b 0
492BWWYCSJC4UU-eOBz6VA parent 30gb 20.6gb 0
o-PptF6tSf-5cSzRV9jkiQ inflight_requests 296mb 47b 0
o-PptF6tSf-5cSzRV9jkiQ request 207.1mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ fielddata 177.5mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ eql_sequence 148mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ model_inference 148mb 0b 0
o-PptF6tSf-5cSzRV9jkiQ parent 287.1mb 150.5mb 0
NMpZPwVJRPK56d9i0IgR3Q eql_sequence 15.5gb 0b 0
NMpZPwVJRPK56d9i0IgR3Q model_inference 15.5gb 0b 0
NMpZPwVJRPK56d9i0IgR3Q inflight_requests 31gb 13.8mb 0
NMpZPwVJRPK56d9i0IgR3Q request 21.6gb 42.5tb 0
NMpZPwVJRPK56d9i0IgR3Q fielddata 18.5gb 720b 0
NMpZPwVJRPK56d9i0IgR3Q parent 30gb 6.3gb 0
Note that the values for the request circuit breaker keep increasing.
Actual memory usage on the cluster is not high (around 10% out of 64GiB on every node) and did not change with the update from 9.3.1 to 9.3.2.
Steps to Reproduce
I created a reproduction case in this git repository: https://github.com/NikolajLeischner/elasticsearch-9.3.2-circuit-breaker-bug
In the test case the "leak" is only ~50KiB per request, but with more documents in the index and query it blows up much faster.
Logs (if relevant)
No response
Elasticsearch Version
9.3.2
Installed Plugins
No response
Java Version
bundled
OS Version
6.17.0-19-generic #19-Ubuntu
Problem Description
After updating an Elastic Cloud cluster from 9.3.1 to 9.3.2 we experience issues with the circuit breaker blocking all incoming requests. The request circuit breaker memory usage as reported by GET _cat/circuit_breaker/ goes up until whatever limit we configure, and then blocks every search.
The affected cluster is used exclusively for percolate searches. We do not experience the same issue with other clusters that do other kinds of searches, and it does appear to be only present in 9.3.2.
In our production setup, the issue appears within minutes of re-starting Elasticsearch. For now we seem to be able to workaround it by disabling the request circuit breaker with this configuration:
indices.breaker.request.overhead: 0Based on the relase notes, I guess this change might be related: #142150
This is the current output from GET /_cat/circuit_breaker/
Note that the values for the request circuit breaker keep increasing.
Actual memory usage on the cluster is not high (around 10% out of 64GiB on every node) and did not change with the update from 9.3.1 to 9.3.2.
Steps to Reproduce
I created a reproduction case in this git repository: https://github.com/NikolajLeischner/elasticsearch-9.3.2-circuit-breaker-bug
In the test case the "leak" is only ~50KiB per request, but with more documents in the index and query it blows up much faster.
Logs (if relevant)
No response