load balancer: fix?: panic mode is disabled only when healthy_panic_threshold is 0%#7478
Conversation
… is 0% ? Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
f45908d to
7c876a4
Compare
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
…ic_threshold_is_0 Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
|
FWIW, it looks like prior to 1.9, setting panic threshold to 0 disabled it, which is the behavior we were expecting. That no longer seems to be the case. |
|
@rene-m-hernandez As you say, setting panic threshold to 0 disabled panic mode in v1.7.1, and not in v1.9.1 |
snowp
left a comment
There was a problem hiding this comment.
Thanks for the fix, this makes sense to me. Could you add a test that verifies that we're not entering panic in this case?
| calculateNormalizedTotalAvailability(per_priority_health_, per_priority_degraded_); | ||
|
|
||
| if (normalized_total_availability == 0) { | ||
| uint64_t global_panic_threshold = std::min<uint64_t>( |
There was a problem hiding this comment.
i'd probably call this just panic_threshold since it's not related to global panic.
also a small nit: this can be const
There was a problem hiding this comment.
Thanks for your review. I'll push the following change.
- add a test that verifies that we're not entering panic in
healthy_panic_threshold == 0 - use
panic_thresholdas variable name - use
const
* use panic_threshold as variable name * use const Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
…ic_threshold == 0 (envoyproxy#7478) Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
snowp
left a comment
There was a problem hiding this comment.
Nice, just a comment style nit
/wait
| // if # of healthy hosts in priority set is low. | ||
| // - normalized total health is 0%. All hosts are down. Redirect 100% of traffic to P=0 and enable | ||
| // panic mode. | ||
| // However, disable panic mode only when healthy panic threshold is 0% |
There was a problem hiding this comment.
maybe change this comment so that on the previous bullet you say "- normalized total health is 0% and panic threshold is > 0" and phrase this line similarly?
There was a problem hiding this comment.
Yes, that's better.
Then, how about the following changes ??
- Modify line 120-121:
// - normalized total health is 0%. All hosts are down. Redirect 100% of traffic to P=0.
// And if panic threshold > 0% then enable panic mode for P=0, otherwise disable.- Delete line 122:
// However, disable panic mode only when healthy panic threshold is 0%Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
alyssawilk
left a comment
There was a problem hiding this comment.
Yeah, I think this is definitely more consistent behavior - thanks for the fix!
Can you update one or both of the docs to make it clear 0 disables?
docs/root/intro/arch_overview/upstream/load_balancing/panic_threshold.rst
api/envoy/api/v2/cds.proto
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
Sure. I updated both of the docs you indicated. |
|
/retest |
|
🔨 rebuilding |
alyssawilk
left a comment
There was a problem hiding this comment.
This is fantastic! I've added some minor rewording suggestions, but otherwise LGTM :-)
| | 5% | 65% | 7% | YES | 93% | NO | 98% | | ||
| +-------------+-------------+----------+--------------+----------+--------------+-------------+ | ||
|
|
||
| Setting the panic threshold to 0%, panic mode can be disabled. |
There was a problem hiding this comment.
-> Panic mode can be disabled by setting the panic threshold to 0%.
|
|
||
| Setting the panic threshold to 0%, panic mode can be disabled. | ||
|
|
||
| If all hosts becomes unhealthy, normalized total health is 0%, all of traffic redirect to P=0. |
There was a problem hiding this comment.
reworded a bit:
If all hosts become unhealthy normalized total health is 0%, and if the panic threshold is above 0% all traffic will be redirected to P=0. However, if the panic threshold is 0% for any priority, that priority will never enter panic mode. In this case if all hosts are unhealthy, Envoy will fail to select a host and will instead immediately return error responses with "503 - no healthy upstream".
| Consequently, for example in HTTP traffic, Envoy will immediately return error responses | ||
| with "503 - no healthy upstream". | ||
|
|
||
| +-----------+-------------+-------------+----------+--------------+----------+--------------+----------------------------+ |
There was a problem hiding this comment.
I think we can not do the charts here, but leave them in if you think they help.
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
|
Thanks for your help !! |
|
/retest |
|
🔨 rebuilding |
|
It's need to retest after resolving CircleCI's general billing issue ... ? |
|
/retest |
|
🔨 rebuilding |
|
Sorry, our CircleCi was down because of unrelated billing issues. |
|
should.... |
|
🔨 rebuilding |
|
Huh, unsure why retest / rebuild doesn't work but over on 7603 pushing a new commit (master merge) worked fine, so worst case you can try that and I'll LGTM again |
Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
|
umm... |
|
@mnktsts2 @alyssawilk we are using branch 1.9.1 and want to set healthy_panic_threshold=0 via runtime or config. Without this fix do we expect the panic mode to be disabled or we need this commit ?. And will it work for both the cases, setting via runtime and config |
|
@pitiwari I think that panic mode can not be disabled without changing the condition to enter panic mode, like this fix. And if the condition change is applied to v1.9.1 it seems to work in the both cases (runtime and config) ... ? In v1.9.1: envoy/source/common/upstream/load_balancer_impl.cc Lines 145 to 150 in ea248e2 |
…hreshold is 0% (envoyproxy#7478) Currently, in load_balancer_impl.cc: recalculatePerPriorityPanic(), even if common_lb_config.healthy_panic_threshold is 0%, a load balancer enters panic mode whenever normalized_total_availability is 0%. I guess, a user who intentionally set to healthy_panic_threshold = 0 expects to immediately return error responses if there is no available host checked by a load balancer. (In fact, current load_balancer_impl.cc: isGlobalPanic() decide not to enter panic mode whenever healthy_panic_threshold is 0%.) So I suggest that panic mode is disabled only when healthy_panic_threshold is 0%. I want this change for automatic degenerating lower priority or optional back-end services. Risk Level: Low (It seems that setting healthy_panic_threshold == 0 is a special case originally. It won't happen unless a user intend to disable panic mode, because default value of healthy_panic_threshold is 50%.) Testing: unit and Integration tests with ./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.release' Docs Changes: inline Signed-off-by: mnktsts2 <mnktsts2@gmail.com>
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description:
common_lb_config.healthy_panic_thresholdis 0%, a load balancer enters panic mode whenevernormalized_total_availabilityis 0%.healthy_panic_threshold = 0expects to immediately return error responses if there is no available host checked by a load balancer.healthy_panic_thresholdis 0%.)Risk Level:
healthy_panic_threshold == 0is a special case originally. It won't happen unless a user intend to disable panic mode, because default value ofhealthy_panic_thresholdis 50%.)Testing:
./ci/run_envoy_docker.sh './ci/do_ci.sh bazel.release'config-enable_panic.yamlandconfig-disable_panic.yamlare envoy's config files to routehttpbin.org.config-enable_panic.yamlhashealthy_panic_threshold: 10for entering panic mode.config-disable_panic.yamlhashealthy_panic_threshold: 0for disabling panic mode.localhost:10000/status/504, check its response code and envoy stats ofcluster.backend.lb_healthy_panicDocs Changes:
healthy_panic_threshold == 0in the panic_threshold docs)Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]