Skip to content

feat: zone aware routing#5299

Merged
zirain merged 15 commits intoenvoyproxy:mainfrom
jukie:zone-aware-routing
Mar 5, 2025
Merged

feat: zone aware routing#5299
zirain merged 15 commits intoenvoyproxy:mainfrom
jukie:zone-aware-routing

Conversation

@jukie
Copy link
Copy Markdown
Contributor

@jukie jukie commented Feb 18, 2025

What type of PR is this?

feat: enable Zone Aware Routing

What this PR does / why we need it:

This adds the ability to use Envoy's Zone Aware Routing via Kubernetes Topology Aware Routing or Traffic Distribution.

Implementation of zone discovery for envoy proxy instances will be handled separately in #5352

Which issue(s) this PR fixes:

xRef: #1909

Release Notes: Yes

@jukie jukie force-pushed the zone-aware-routing branch from 8abae0e to e23145a Compare February 20, 2025 00:36
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 89.65517% with 12 lines in your changes missing coverage. Please review.

Project coverage is 65.22%. Comparing base (20371d1) to head (856ad34).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/route.go 65.00% 5 Missing and 2 partials ⚠️
internal/xds/translator/cluster.go 95.89% 2 Missing and 1 partial ⚠️
internal/ir/xds.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5299      +/-   ##
==========================================
+ Coverage   65.11%   65.22%   +0.10%     
==========================================
  Files         212      212              
  Lines       33605    33675      +70     
==========================================
+ Hits        21883    21965      +82     
+ Misses      10399    10390       -9     
+ Partials     1323     1320       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jukie jukie marked this pull request as ready for review February 20, 2025 02:51
@jukie jukie requested a review from a team as a code owner February 20, 2025 02:51
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 20, 2025

@arkodg mind taking a first-pass look? If possible I'd prefer to limit the scope here to enabling zone aware routing with future plans to implement something similar to Istio's locality failover or weighted distribution.

Enabling both would require some more API design and I imagine changes to BackendTrafficPolicy but my current PR will enable zone aware routing usage without breaking or changing existing routing behavior (please correct me there).

Edit: scratch that implementation wasn't too bad

@jukie jukie marked this pull request as draft February 20, 2025 06:59
@jukie jukie force-pushed the zone-aware-routing branch 2 times, most recently from 4afb192 to 77fead8 Compare February 23, 2025 04:34
@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Feb 25, 2025

hey hoping we can get #5269 in first to avoid large merge conflicts

@jukie jukie force-pushed the zone-aware-routing branch from 77fead8 to 9c2c5b9 Compare February 26, 2025 01:44
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 26, 2025

Will split the initContainer implementation to a separate PR

@jukie jukie force-pushed the zone-aware-routing branch from 9c2c5b9 to d9c5542 Compare February 26, 2025 02:02
@jukie jukie force-pushed the zone-aware-routing branch from d9c5542 to 45bb5a6 Compare February 26, 2025 05:43
@jukie jukie marked this pull request as ready for review February 26, 2025 13:55
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 26, 2025

/retest

@jukie jukie force-pushed the zone-aware-routing branch 3 times, most recently from 3fc7d56 to 1062a5d Compare February 26, 2025 17:42
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 26, 2025

/retest

1 similar comment
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 26, 2025

/retest

@jukie jukie force-pushed the zone-aware-routing branch 2 times, most recently from b56f66b to c0e7223 Compare February 26, 2025 23:53
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 27, 2025

I'm finding some issues with this breaking weighted backends. @guydc or @arkodg could you have a look please?

From https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto#config-endpoint-v3-localitylbendpoints:

load_balancing_weight
([UInt32Value](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#uint32value)) Optional: Per priority/region/zone/sub_zone weight; at least 1. The load balancing weight for a locality is divided by the sum of the weights of all localities at the same priority level to produce the effective percentage of traffic for the locality. The sum of the weights of all localities at the same priority level must not exceed uint32_t maximal value (4294967295).

Locality weights are only considered when [locality weighted load balancing](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/locality_weight#arch-overview-load-balancing-locality-weighted-lb) is configured. These weights are ignored otherwise. If no weights are specified when locality weighted load balancing is enabled, the locality is assigned no load.

Which means the current logic of setting locality.LoadBalancingWeight to ds.Weight will be ignored. I then tried also setting each individual endpoint's load balancing weight to ds.Weight which did bring some weighted backend routing back but the conformance tests still fail and it's not clear to me why yet. The backends are receiving close to the desired weights so something it's partially working but something is off.

I also found an open Envoy issue which could be relevant regarding whether zone aware routing respects weights. I'm beginning to read through the loadbalancer code to build a better understanding but would either of you happen to know why zone aware lb config degrades routing for weighted backendRefs on xRoutes?

2025-02-27T04:27:54.5165617Z     httproute-weight.go:72: Traffic distribution test failed (5/10): backend "infra-backend-v1" weighted traffic of 0.64 not within tolerance 0.7 (+/-0.050000)
2025-02-27T04:27:54.5166035Z         backend "infra-backend-v2" weighted traffic of 0.36 not within tolerance 0.3 (+/-0.050000)

@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 27, 2025

/retest

@jukie jukie force-pushed the zone-aware-routing branch 2 times, most recently from 80b007c to 2fd35ea Compare February 27, 2025 04:05
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Feb 27, 2025

/retest

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Mar 5, 2025

@jukie as a workaround can you

  1. rm
  2. touch
  3. make testdata
  4. commit those changes and push

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
jukie and others added 2 commits March 4, 2025 20:14
Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
@jukie jukie requested a review from arkodg March 5, 2025 03:47
arkodg
arkodg previously approved these changes Mar 5, 2025
Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
@jukie jukie requested review from arkodg and zhaohuabing March 5, 2025 04:38
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Mar 5, 2025

/retest

Signed-off-by: jukie <10012479+Jukie@users.noreply.github.com>
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Mar 5, 2025

/retest

@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Mar 5, 2025

/retest

1 similar comment
@jukie
Copy link
Copy Markdown
Contributor Author

jukie commented Mar 5, 2025

/retest

Copy link
Copy Markdown
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience!

@zirain zirain merged commit 64b753b into envoyproxy:main Mar 5, 2025
27 checks passed
@zirain
Copy link
Copy Markdown
Member

zirain commented Mar 5, 2025

@jukie is there a flaky? https://github.com/envoyproxy/gateway/actions/runs/13677062142/job/38239825535

need sort to make a stable result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants