Skip to content

Updating LB docs to the new LB logic #359

Merged
htuch merged 9 commits intoenvoyproxy:masterfrom
alyssawilk:real_lb
Jan 3, 2018
Merged

Updating LB docs to the new LB logic #359
htuch merged 9 commits intoenvoyproxy:masterfrom
alyssawilk:real_lb

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

This mostly goes with the upcoming PR depending on envoyproxy/envoy#2244 landing

Will close for now - just wanted the docs PR for 2244

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

huh. local check_format complains about spaces in my change.
local fix_format changes 5 files I didn't touch and doesn't update the lb doc.

I'll poke around.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@mattklein123
Copy link
Copy Markdown
Member

@alyssawilk I noticed the same thing and fixed here: #381. I'm confused how the stuff got through in the first place.

@alyssawilk
Copy link
Copy Markdown
Contributor Author

alyssawilk commented Jan 2, 2018 via email

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is super cool. Some small nits.

another are fairly simplistic: a given priority level will be used until it has zero healthy hosts,
at which point it will hard fail to the next highest priority level.
level. For each EDS :ref:`LocalityLbEndpoints<envoy_api_msg_LocalityLbEndpoints>` an optional
priority may also be specifie. When endpoints at the highest priority level (P=0) are healthy, all
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "specifie"

at which point it will hard fail to the next highest priority level.
level. For each EDS :ref:`LocalityLbEndpoints<envoy_api_msg_LocalityLbEndpoints>` an optional
priority may also be specifie. When endpoints at the highest priority level (P=0) are healthy, all
traffic will land on endpoints in that priority leve. As endpoints for the highest priority level
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "leve" - do you mind running through spell checker?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha. I did spell check, but then I ran a buggy version of check_format which replaced "[^.]. " with ". " and so removed the last letter of a bunch of words. Shame on me for not checking diffs before and after.


Currently, it is assumed that each priority level is over-provisioned by a (hard-coded) factor of
1.4. So if 80% of the endpoints are healthy, the priority level is still considered healthy because
80*1.4 > 10. As the number of healthy endpoints dips below 72%, the health of the priority level
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/10/100 ?

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great at conveying intuition about what is happening. Would be good to also add in the underlying formula for computing how the load is spilled for those .interested.

Currently, it is assumed that each priority level is over-provisioned by a (hard-coded) factor of
1.4. So if 80% of the endpoints are healthy, the priority level is still considered healthy because
80*1.4 > 10. As the number of healthy endpoints dips below 72%, the health of the priority level
goes below 100, and any residual traffic will flow to the next priority level.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "residual" here?

Assume a simple set-up with 2 priority levels, P=1 100% healthy.

+----------------------------+---------------------------+----------------------------+
| Percent healthy endpoints | Percent of traffic to P=0 | Percent of traffic to P=1 |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Percent healthy endpoints for P=0?

+----------------------------+---------------------------+----------------------------+
| 71 | 99 | 1 |
+----------------------------+---------------------------+----------------------------+
| 50 | 70 | 30 |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to see the expression that is used to compute the residue here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the expressions below to try and compute the 50% health case, I get P_0 traffic at 33% and P_1 traffic at 67%. Maybe I did math wrong, can you double check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so percent healthy (healthy_PX_backends / total_PX_backends) is 50%
health(P_X) = (140 * .5) = 70
P_0 = min(70, 100) = 70
load to P_1= 100 - (70) = 30

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
percent load to P=0 == min(100, health(P=0) * 100 / total_health)
health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends)
total_health == min(100, Σ(health(P=0)...health(P=x));
percent load to P=X == 100 - Σ(percent_load(P0)..percent_load(Px-1))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch this what you were looking for?

Suggestions for improvement welcome.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is great! Please verify that when the RST turns to HTML, this is all readable.

To sum this up in pseudo algorithms:
percent load to P=0 == min(100, health(P=0) * 100 / total_health)
health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends)
total_health == min(100, Σ(health(P=0)...health(P=x));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If using fancy Unicode symbols, one minor improvement might be to use P_0 and P_1 subscripts instead of P=0, P=1. Totally an optional readability improvement.

+-------------+-------------+------------+----------------+----------------+----------------+

To sum this up in pseudo algorithms:
percent load to P=0 == min(100, health(P=0) * 100 / total_health)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer = to == here, similar to normal expression or function definitions in math.


To sum this up in pseudo algorithms:
percent load to P=0 == min(100, health(P=0) * 100 / total_health)
health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a scaled ratio here, but in total_health below the min expression compares to a percentage (100) rather than 1.0. Is this a bug?

+----------------------------+---------------------------+----------------------------+
| 71 | 99 | 1 |
+----------------------------+---------------------------+----------------------------+
| 50 | 70 | 30 |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the expressions below to try and compute the 50% health case, I get P_0 traffic at 33% and P_1 traffic at 67%. Maybe I did math wrong, can you double check?

percent load to P=0 == min(100, health(P=0) * 100 / total_health)
health(P=X) == (1.4 * healthy_PX_backends / total_PX_backends)
total_health == min(100, Σ(health(P=0)...health(P=x));
percent load to P=X == 100 - Σ(percent_load(P0)..percent_load(Px-1))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is great! Please verify that when the RST turns to HTML, this is all readable.

@alyssawilk alyssawilk force-pushed the real_lb branch 12 times, most recently from 3984fd4 to 3621602 Compare January 3, 2018 17:06
@htuch
Copy link
Copy Markdown
Member

htuch commented Jan 3, 2018

Ah, your math is right, I was computing total_health as 140, ignoring the min(100 bit.

@alyssawilk alyssawilk force-pushed the real_lb branch 2 times, most recently from 65037b2 to 0dee4bf Compare January 3, 2018 17:08
P=2 would only receive traffic if the combined health of P=0 + P=1 was less than 100.

+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+
| P=0 healthy endpoints | P=1 healthy endpoints | P=2 healthy endpoints | Traffic to P=0 | Traffic to P=1 | Traffic to P=3 |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should last column be Traffic to P==2?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
::

load to P_0 = min(100, health(P_0) * 100 / total_health)
health(P_X) = 100, (140 * healthy_PX_backends / total_PX_backends)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check docs.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@htuch htuch merged commit a1f2fe9 into envoyproxy:master Jan 3, 2018
alyssawilk added a commit to envoyproxy/envoy that referenced this pull request Jan 4, 2018
Changes Envoy load balancing across priority levels from a hard failover to trickling data based on the health percentage of each priority level.

Risk Level: Medium

Testing:
Added thorough unit testing for the lb failover code, as well as fixing up ring hash failover testing and adding ring-hash specific tests.

Docs Changes:
envoyproxy/data-plane-api#359

Release Notes: n/a: falls under existing note

[Optional Fixes #Issue]
Fixes #1929
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants