Make CBE message creation more robust#87687
Make CBE message creation more robust#87687elasticsearchmachine merged 8 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
Child circuit breakers rely on proper matching of acquire/release pairs. This can be tricky to get right. If we get it wrong and accidentally double-release a CB then it may end up with a negative `used` value. This is definitely a bad situation in which to find ourselves, but today in production it's made a whole lot worse because it causes exceptions on every attempt to report a `CircuitBreakerStats` or to construct a parent `CircuitBreakingException`. This commit makes the message construction and stats serialization a little more robust so that it's clearer what is going on in production.
c15655e to
c87f62f
Compare
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
grcevski
left a comment
There was a problem hiding this comment.
LGTM! Does it make sense to log a warning of some kind when we encounter a negative value with a stack trace? I'm hoping here that with enough stack traces we could at least pinpoint the general area where we have this double decrement, or perhaps that won't help us much.
|
Sure, I added such logging in 5ff3a42. This should at least tell us which CB is affected in a way that's easy to search across all clusters' logs. |
Child circuit breakers rely on proper matching of acquire/release pairs. This can be tricky to get right. If we get it wrong and accidentally double-release a CB then it may end up with a negative `used` value. This is definitely a bad situation in which to find ourselves, but today in production it's made a whole lot worse because it causes exceptions on every attempt to report a `CircuitBreakerStats` or to construct a parent `CircuitBreakingException`. This commit makes the message construction and stats serialization a little more robust so that it's clearer what is going on in production. Relates elastic#86059.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Child circuit breakers rely on proper matching of acquire/release pairs. This can be tricky to get right. If we get it wrong and accidentally double-release a CB then it may end up with a negative `used` value. This is definitely a bad situation in which to find ourselves, but today in production it's made a whole lot worse because it causes exceptions on every attempt to report a `CircuitBreakerStats` or to construct a parent `CircuitBreakingException`. This commit makes the message construction and stats serialization a little more robust so that it's clearer what is going on in production. Relates elastic#86059. Backport of elastic#87687.
Child circuit breakers rely on proper matching of acquire/release pairs. This can be tricky to get right. If we get it wrong and accidentally double-release a CB then it may end up with a negative `used` value. This is definitely a bad situation in which to find ourselves, but today in production it's made a whole lot worse because it causes exceptions on every attempt to report a `CircuitBreakerStats` or to construct a parent `CircuitBreakingException`. This commit makes the message construction and stats serialization a little more robust so that it's clearer what is going on in production. Relates #86059.
Child circuit breakers rely on proper matching of acquire/release pairs. This can be tricky to get right. If we get it wrong and accidentally double-release a CB then it may end up with a negative `used` value. This is definitely a bad situation in which to find ourselves, but today in production it's made a whole lot worse because it causes exceptions on every attempt to report a `CircuitBreakerStats` or to construct a parent `CircuitBreakingException`. This commit makes the message construction and stats serialization a little more robust so that it's clearer what is going on in production. Relates #86059. Backport of #87687.
This reverts commit 9ff9026.
|
Reverted due to test failures - will be reinstated by #87881. |
Child circuit breakers rely on proper matching of acquire/release pairs.
This can be tricky to get right. If we get it wrong and accidentally
double-release a CB then it may end up with a negative
usedvalue.This is definitely a bad situation in which to find ourselves, but today
in production it's made a whole lot worse because it causes exceptions
on every attempt to report a
CircuitBreakerStatsor to construct aparent
CircuitBreakingException.This commit makes the message construction and stats serialization a
little more robust so that it's clearer what is going on in production.
Relates #86059.