Add support for healthy panic threshold#496
Conversation
|
Hi @knrc. Thanks for your PR. I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
This PR is related to issue #486 |
|
/assign @sebastienvas |
|
@sebastienvas noticed there is no OWNERS file - perhaps you can help fix that? Cheers |
|
@costinm Given your comments in the environment meeting should this be included for 1.0? |
|
/retest |
|
@knrc: you can't request testing unless you are a istio member. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest
On Mon, Jun 4, 2018 at 11:16 AM istio-bot ***@***.***> wrote:
@knrc <https://github.com/knrc>: you can't request testing unless you are
a istio <https://github.com/orgs/istio/people> member.
In response to this
<#496 (comment)>:
/retest
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#496 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADP0S8GXTjiqBRJchWMUhSBXu_A4PU9ks5t5XmQgaJpZM4UE9ra>
.
--
*Christian Posta*
twitter: @christianposta
http://blog.christianposta.com <http://www.christianposta.com/blog>
|
| // known to be healthy, if the percentage of healthy hosts drops below this panic threshold | ||
| // the proxy will disregard the health status and balance across all hosts. The default | ||
| // panic threshold is 50%. | ||
| int32 healthy_panic_threshold = 5; |
There was a problem hiding this comment.
Are you also implementing this? if not, you need to add a not-implemented-hide tag so that this does not show up in the docs.
rshriram
left a comment
There was a problem hiding this comment.
I just realized. This is not part of outlier detection at all.
|
Following up here to document a slack conversation. The healthy panic threshold covers both active and passive health checks, in the outlier docs it contains the following
@rshriram there may be a better place for this in the API, do you have any suggestions? The code I have currently creates the following: https://pastebin.com/89g589RK Should this be moved under LoadBalancerSettings? |
* Convert Pilot ip mesh attributes to bytes * Update comment * Update after review comment
|
The istio project PR is istio/istio#9176, it will need updating after api is merged |
|
@rshriram Can you please take another look at this and the istio/istio#9176 PR? If you let me know your feedback I can make the appropriate changes. |
| uint64 minimum_ring_size = 4; | ||
| }; | ||
|
|
||
| message CommonLbConfig { |
There was a problem hiding this comment.
Please get an LGTM from networking folks. My comments here are structural.
Please be consistent with naming (i.e. CommonLbConfig => CommonLBConfig) and add comments to the message itself.
I'd also recommend putting this into the individual LB protos, if it makes sense. Otherwise, when editing, people need to add a "common" stanza for being able to set these things. It is much more intuitive to put these settings together when editing.
There was a problem hiding this comment.
Sounds good, I can take a look at both of those changes
| SimpleLB simple = 1; | ||
| ConsistentHashLB consistent_hash = 2; | ||
| } | ||
| CommonLbConfig common = 3; |
There was a problem hiding this comment.
How about adding the parameter directly, without the 'common' ? If it is common, it doesn't need an extra indent level and struct. I know in envoy the config is deep and full of 'common' - but it makes it harder to author.
| // known to be healthy, if the percentage of healthy hosts drops below this panic threshold | ||
| // the proxy will disregard the health status and balance across all hosts. The default | ||
| // panic threshold is 50%. | ||
| int32 healthy_panic_threshold = 1; |
There was a problem hiding this comment.
I'm not very good with names - but not sure this is the most intuitive. Maybe include_unhealthy_threshold ?
"Panic" is a bit confusing. I assume this matches envoy name ?
There was a problem hiding this comment.
Yes, this is the name used in envoy. I can change this to your name.
There was a problem hiding this comment.
I just pushed up a new version but forgot to modify the name, will do this shortly and push up again
|
Do we want to add a section on health checking ? I think there are few other related things we may want to configure. It's good to include the setting in the yaml examples, to get an idea how it 'feels' |
|
Okay. Sorry for the bike shedding. I see your point about the threshold applying to outlier detection. Please put it back in outlier detection and call it panic_threshold with comments saying outlier detection will be disabled if healthy hosts fall below this percentage. Default is 0% |
|
I guess I'm a bit confused. From the comments, it doesn't seem related to outlier - but the behavior Can you clarify in the docs if it is outlier-related or not, and how the 'health' is determined ? Does it require envoy actively checking health (and the associated settings), or outlier ejection, or both ? I don't think we have settings or code for the active health checking. At some point we'll add them - would they belong to 'outlier' section ? Like Shriram, I'm sorry for bike shedding, but APIs are very hard to change after they are released. |
Sorry, for some reason I missed these comments. The sentences I referred to on Jun 7 are still in the envoy outlier detection docs. I'll dig through the envoy code later today and document the flow here, in the meantime ignore the update I pushed. BTW don't worry about bike shedding, I would rather we took the time and got this right. |
| // known to be healthy, if the percentage of healthy hosts drops below this panic threshold | ||
| // the proxy will disregard the health status and balance across all hosts. The default | ||
| // panic threshold is 50%. | ||
| int32 healthy_panic_threshold = 3; |
There was a problem hiding this comment.
move it to outlier detection? and call it panic_threshold.. So it would make more sense in context of outlier detection
There was a problem hiding this comment.
That's probably worth discussing with @costinm as he wanted it in the LB section and called something like include_unhealthy_threshold instead of using panic. The current name matches the envoy configuration and it applies to general LB features but impacts outlier detection.
When a host is ejected by the outlier detection it will set the host's health flag to contain Host::HealthFlag::FAILED_OUTLIER_CHECK, this will then cause the load balancer code to remove it from the list of unhealthy hosts until it is either unejected by the outlier detection or the panic occurs.
@rshriram @costinm What do you think? Keep it where it is or move it? Keep the name or change it?
|
This is where it makes the most sense for me. When doing circuit breaking
or outlier detection, this panic setting really comes in handy when trying
to mimic the behavior of other circuit-breaking functionality like Netflix
Hystrix.
…On Mon, Oct 22, 2018 at 3:45 PM Shriram Rajagopalan < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In networking/v1alpha3/destination_rule.proto
<#496 (comment)>:
> @@ -328,6 +328,12 @@ message LoadBalancerSettings {
SimpleLB simple = 1;
ConsistentHashLB consistent_hash = 2;
}
+
+ // Load balancing panic threshold. The load balancing pool will normally include hosts
+ // known to be healthy, if the percentage of healthy hosts drops below this panic threshold
+ // the proxy will disregard the health status and balance across all hosts. The default
+ // panic threshold is 50%.
+ int32 healthy_panic_threshold = 3;
move it to outlier detection? and call it panic_threshold.. So it would
make more sense in context of outlier detection
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#496 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADP0ZndYDCUfUTsIL2n2fu3M5VRnLV1ks5unkpvgaJpZM4UE9ra>
.
--
*Christian Posta*
twitter: @christianposta
http://blog.christianposta.com <http://www.christianposta.com/blog>
|
|
Over the years, I have been trying to maintain some consistency in networking APIs. For good or worse, we choose to have clearer explanations and more user friendly fields rather than mirroring everything from Envoy (otherwise there is no point in having a separate Istio API). While it might make sense in Envoy to have this field in laod balancer (for future purposes), as far as Istio is concerned, it is extremely confusing to end users to have a panic_threshold field in load balancer when it refers to outlier detection (that could be disabled). So please add it to outlier detection section. |
| // Load balancing panic threshold. The load balancing pool will normally include hosts | ||
| // known to be healthy, if the percentage of healthy hosts drops below this panic threshold | ||
| // the proxy will disregard the health status and balance across all hosts. The default | ||
| // panic threshold is 50%. |
There was a problem hiding this comment.
threshold_percent is redundant.
Also please fix the documentation. "Load balancing panic threshold" is a vague thing to say in outlier detection. The second sentence has a typo/grammar issue..
There was a problem hiding this comment.
No problem, I was just trying to make the name consistent with max ejection. The documentation is a copy of the envoy docs but I can change that.
There was a problem hiding this comment.
I think I found a better name for this..
// Outlier detection will be enabled as long as the associated load balancing
// pool has atleast min_health_percent hosts in healthy mode. When the
// percentage of healthy hosts in the load balancing pool drops below this
// threshold, outlier detection will be disabled and the proxy will load balance
// across all hosts in the pool (healthy and unhealthy).
int32 min_health_percent
|
@rshriram updated |
|
Can you send this PR to release-1.1 branch please? |
|
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
|
It looks as if googlebot is confused so I'll try to repush |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
CLAs look good, thanks! |
No description provided.