Make zone load balancing logic depend on local_cluster host distribution.#174
Make zone load balancing logic depend on local_cluster host distribution.#174RomanDzhabarov merged 13 commits intomasterfrom
Conversation
|
|
||
| // If number of hosts in a local zone big enough then route all requests to the same zone. | ||
| if (local_zone_healthy_hosts.size() * number_of_zones >= host_set_.healthyHosts().size()) { | ||
| std::vector<uint64_t> local_percentage = |
There was a problem hiding this comment.
Use a stack allocated array with the right size, rework the functions above to take the array and populate it if you want to do that. You can't allocate things like vectors in this code path.
There was a problem hiding this comment.
Or just do the pre-calculation work now. I would actually prefer you just do that.
There was a problem hiding this comment.
i made stack alloc for now, i want make sure things works good in prod
| // At this point we should route cross zone as we cannot route to the local zone. | ||
| stats_.zone_routing_no_sampled_.inc(); | ||
|
|
||
| std::vector<uint64_t> capacity_left; |
| stats_.zone_routing_no_sampled_.inc(); | ||
|
|
||
| std::vector<uint64_t> capacity_left; | ||
| // Local zone does not have additional capacity (we already routed what we could), but |
There was a problem hiding this comment.
More comments. Don't understand what is happening here.
…ts." This reverts commit ab35059. Conflicts: test/common/upstream/load_balancer_simulation_test.cc
| if (total_hosts != 0) { | ||
| size_t i = 0; | ||
| for (const auto& zone_hosts : hosts_per_zone) { | ||
| ret[i++] = 10000ULL * zone_hosts.size() / total_hosts; |
There was a problem hiding this comment.
Please add comment explaining the 10,000ULL.
There was a problem hiding this comment.
yep, good point, will add more details into load_balancer_impl.h. Done.
| // Now we need to figure out how much traffic we can route cross zone and to which exact zone | ||
| // we should route. Percentage of requests routed cross zone to a specific zone should be | ||
| // proportional to the residual capacity upstream zone has. | ||
| // |
include/envoy/upstream/upstream.h
Outdated
| COUNTER(update_failure) \ | ||
| COUNTER(zone_cluster_too_small) \ | ||
| COUNTER(zone_over_percentage) \ | ||
| COUNTER(zone_routing_all_directly) \ |
There was a problem hiding this comment.
nit: can you prefix all of the lb ones with "lb_" and then reorder this list.
There was a problem hiding this comment.
yep, will do for consistency with panic threshold stats
| stats_.zone_over_percentage_.inc(); | ||
| return local_zone_healthy_hosts; | ||
| uint64_t local_percentage[number_of_zones]; | ||
| calculateZonePercentage(local_host_set_->healthyHostsPerZone(), local_percentage); |
There was a problem hiding this comment.
ASSERT here also that local_host_set_->healthyHostsPerZone().size() == host_set_.healthyHostsPerZone().size()
| * In this case we'll route requests to hosts no matter if they are healthy or not. | ||
| */ | ||
| bool isGlobalPanic(const HostSet& host_set); | ||
| const std::vector<HostPtr>& tryChooseLocalZoneHosts(); |
There was a problem hiding this comment.
nit: add comment or newline before this line
| // Local zone (index 0) does not have residual capacity as we have routed all we could. | ||
| residual_capacity[0] = 0; | ||
| for (size_t i = 1; i < number_of_zones; ++i) { | ||
| // Only route to the zones that have additional capacity. |
There was a problem hiding this comment.
still having trouble quickly understanding this logic. More comments, and potentially an example.
There was a problem hiding this comment.
more comments added
| // Not zone routed requests will be distributed between all hosts and hence | ||
| // we need to route only fraction of req_percent_to_route to the local zone. | ||
| double actual_routing_ratio = (ratio_to_route - zone_host_ratio) / (1 - zone_host_ratio); | ||
| // Random sampling to select specific zone for cross zone traffic based on the additional |
There was a problem hiding this comment.
more comments. What type of number does residual_capacity[number_of_zones - 1] have in it. What range?
There was a problem hiding this comment.
more comments added
| // sampled value is, we accumulate values in residual capacity. This is what it will look like: | ||
| // residual_capacity: 0 10000 15000 | ||
| // Now to find zone to route (bucket) we could simply iterate over residual_capacity searching | ||
| // where |
There was a problem hiding this comment.
nit: can you fix line wrapping in this comment block. Keep to 100 col flow unless you need a new paragraph.
There was a problem hiding this comment.
:( fix format broke this... fixing
* Add Dockerfile for the C++ SDK. Signed-off-by: John Plevyak <jplevyak@gmail.com>
* Add Dockerfile for the C++ SDK. Signed-off-by: John Plevyak <jplevyak@gmail.com>
…#114) (envoyproxy#174) Previously, the update callback was called only when the secret was received for the first time or when its value changed. This meant that if the same secret (e.g. trusted CA) was used in multiple resources, then resources using it but configured after the secret was already received, remained unconfigured until the secret's value changed. The missing callback should have resulted in transport factories stuck in the "not ready" state, however, because of an incorrect code, the available secret was processed like inlined validation context, and only rules from the "secret" part of the validation context were applied, leading to a complete bypass of rules from the "default" part. Signed-off-by: Piotr Sikora <piotrsikora@google.com> Co-authored-by: Oliver Liu <yonggangl@google.com> Co-authored-by: Oliver Liu <yonggangl@google.com>
…verprovisioning_factor zh-translation: /intro/arch_overview/upstream/load_balancing/overprovisioning.rst
**Commit Message**: This pull request includes updates to dependencies and configuration files for better security and compatibility. Envoy Gateway dependency is kept to the latest, whose tag is a weird v0.5.0-rc2, and because of that Trivy and Dependabot tries to "downgrade" to the latest tagged version. This fixes that by adding EG to the ignore list of dependency as well as add the latest EG CVE to the trivyignore file. Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Commit Message**: This fixes the placement of `ignore` section in dependabot.yml. **Related Issues/PRs (if applicable)**: Follow up on #174 Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Documentation around runtime control/stats/general overview are going to be on a separate commit.
I was running simulation tests from load_balancer_simulation_test.cc, results look very good.