Updating LB docs to the new LB logic #359
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
This mostly goes with the upcoming PR depending on envoyproxy/envoy#2244 landing Will close for now - just wanted the docs PR for 2244 |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
huh. local check_format complains about spaces in my change. I'll poke around. |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
|
@alyssawilk I noticed the same thing and fixed here: #381. I'm confused how the stuff got through in the first place. |
|
#382 should forward-fix
that.
…On Tue, Jan 2, 2018 at 4:14 PM, Matt Klein ***@***.***> wrote:
@alyssawilk <https://github.com/alyssawilk> I noticed the same thing and
fixed here: #381 <#381>.
I'm confused how the stuff got through in the first place.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#359 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARYFvdq9okApshb2PIhXTcL7_Y1I_sM8ks5tGpwhgaJpZM4RKj5U>
.
|
mattklein123
left a comment
There was a problem hiding this comment.
nice, this is super cool. Some small nits.
| another are fairly simplistic: a given priority level will be used until it has zero healthy hosts, | ||
| at which point it will hard fail to the next highest priority level. | ||
| level. For each EDS :ref:`LocalityLbEndpoints<envoy_api_msg_LocalityLbEndpoints>` an optional | ||
| priority may also be specifie. When endpoints at the highest priority level (P=0) are healthy, all |
| at which point it will hard fail to the next highest priority level. | ||
| level. For each EDS :ref:`LocalityLbEndpoints<envoy_api_msg_LocalityLbEndpoints>` an optional | ||
| priority may also be specifie. When endpoints at the highest priority level (P=0) are healthy, all | ||
| traffic will land on endpoints in that priority leve. As endpoints for the highest priority level |
There was a problem hiding this comment.
typo: "leve" - do you mind running through spell checker?
There was a problem hiding this comment.
Hahaha. I did spell check, but then I ran a buggy version of check_format which replaced "[^.]. " with ". " and so removed the last letter of a bunch of words. Shame on me for not checking diffs before and after.
|
|
||
| Currently, it is assumed that each priority level is over-provisioned by a (hard-coded) factor of | ||
| 1.4. So if 80% of the endpoints are healthy, the priority level is still considered healthy because | ||
| 80*1.4 > 10. As the number of healthy endpoints dips below 72%, the health of the priority level |
htuch
left a comment
There was a problem hiding this comment.
I think this is great at conveying intuition about what is happening. Would be good to also add in the underlying formula for computing how the load is spilled for those .interested.
| Currently, it is assumed that each priority level is over-provisioned by a (hard-coded) factor of | ||
| 1.4. So if 80% of the endpoints are healthy, the priority level is still considered healthy because | ||
| 80*1.4 > 10. As the number of healthy endpoints dips below 72%, the health of the priority level | ||
| goes below 100, and any residual traffic will flow to the next priority level. |
| Assume a simple set-up with 2 priority levels, P=1 100% healthy. | ||
|
|
||
| +----------------------------+---------------------------+----------------------------+ | ||
| | Percent healthy endpoints | Percent of traffic to P=0 | Percent of traffic to P=1 | |
There was a problem hiding this comment.
Percent healthy endpoints for P=0?
| +----------------------------+---------------------------+----------------------------+ | ||
| | 71 | 99 | 1 | | ||
| +----------------------------+---------------------------+----------------------------+ | ||
| | 50 | 70 | 30 | |
There was a problem hiding this comment.
I think it would be helpful to see the expression that is used to compute the residue here.
There was a problem hiding this comment.
I used the expressions below to try and compute the 50% health case, I get P_0 traffic at 33% and P_1 traffic at 67%. Maybe I did math wrong, can you double check?
There was a problem hiding this comment.
so percent healthy (healthy_PX_backends / total_PX_backends) is 50%
health(P_X) = (140 * .5) = 70
P_0 = min(70, 100) = 70
load to P_1= 100 - (70) = 30
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
| percent load to P=0 == min(100, health(P=0) * 100 / total_health) | ||
| health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends) | ||
| total_health == min(100, Σ(health(P=0)...health(P=x)); | ||
| percent load to P=X == 100 - Σ(percent_load(P0)..percent_load(Px-1)) |
There was a problem hiding this comment.
@htuch this what you were looking for?
Suggestions for improvement welcome.
There was a problem hiding this comment.
Yes, this is great! Please verify that when the RST turns to HTML, this is all readable.
| To sum this up in pseudo algorithms: | ||
| percent load to P=0 == min(100, health(P=0) * 100 / total_health) | ||
| health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends) | ||
| total_health == min(100, Σ(health(P=0)...health(P=x)); |
There was a problem hiding this comment.
Nit: If using fancy Unicode symbols, one minor improvement might be to use P_0 and P_1 subscripts instead of P=0, P=1. Totally an optional readability improvement.
| +-------------+-------------+------------+----------------+----------------+----------------+ | ||
|
|
||
| To sum this up in pseudo algorithms: | ||
| percent load to P=0 == min(100, health(P=0) * 100 / total_health) |
There was a problem hiding this comment.
Nit: prefer = to == here, similar to normal expression or function definitions in math.
|
|
||
| To sum this up in pseudo algorithms: | ||
| percent load to P=0 == min(100, health(P=0) * 100 / total_health) | ||
| health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends) |
There was a problem hiding this comment.
This is a scaled ratio here, but in total_health below the min expression compares to a percentage (100) rather than 1.0. Is this a bug?
| +----------------------------+---------------------------+----------------------------+ | ||
| | 71 | 99 | 1 | | ||
| +----------------------------+---------------------------+----------------------------+ | ||
| | 50 | 70 | 30 | |
There was a problem hiding this comment.
I used the expressions below to try and compute the 50% health case, I get P_0 traffic at 33% and P_1 traffic at 67%. Maybe I did math wrong, can you double check?
| percent load to P=0 == min(100, health(P=0) * 100 / total_health) | ||
| health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends) | ||
| total_health == min(100, Σ(health(P=0)...health(P=x)); | ||
| percent load to P=X == 100 - Σ(percent_load(P0)..percent_load(Px-1)) |
There was a problem hiding this comment.
Yes, this is great! Please verify that when the RST turns to HTML, this is all readable.
3984fd4 to
3621602
Compare
|
Ah, your math is right, I was computing total_health as 140, ignoring the |
65037b2 to
0dee4bf
Compare
| P=2 would only receive traffic if the combined health of P=0 + P=1 was less than 100. | ||
|
|
||
| +-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+ | ||
| | P=0 healthy endpoints | P=1 healthy endpoints | P=2 healthy endpoints | Traffic to P=0 | Traffic to P=1 | Traffic to P=3 | |
There was a problem hiding this comment.
Should last column be Traffic to P==2?
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
| :: | ||
|
|
||
| load to P_0 = min(100, health(P_0) * 100 / total_health) | ||
| health(P_X) = 100, (140 * healthy_PX_backends / total_PX_backends) |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level. Risk Level: Medium Testing: Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests. Docs Changes: envoyproxy/data-plane-api#359 Release Notes: n/a: falls under existing note [Optional Fixes #Issue] Fixes #1929
No description provided.