Skip to content

feat: support original dst based routing#6271

Open
Xunzhuo wants to merge 2 commits intoenvoyproxy:mainfrom
Xunzhuo:feat-static-resolver
Open

feat: support original dst based routing#6271
Xunzhuo wants to merge 2 commits intoenvoyproxy:mainfrom
Xunzhuo:feat-static-resolver

Conversation

@Xunzhuo
Copy link
Copy Markdown
Member

@Xunzhuo Xunzhuo commented Jun 8, 2025

What type of PR is this?

feat: support host override based routing

What this PR does / why we need it:

support choose the ep by specific patterns: header, metadata.

Scenarios like llm inference endpoint picker

Which issue(s) this PR fixes:

Fixes #6234

Release Notes: Yes

@Xunzhuo Xunzhuo requested a review from a team as a code owner June 8, 2025 06:28
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 8, 2025

Codecov Report

Attention: Patch coverage is 89.61039% with 16 lines in your changes missing coverage. Please review.

Project coverage is 70.96%. Comparing base (1eb1ca9) to head (fea0cce).

Files with missing lines Patch % Lines
internal/xds/translator/cluster.go 87.32% 6 Missing and 3 partials ⚠️
internal/gatewayapi/backend.go 84.00% 2 Missing and 2 partials ⚠️
internal/gatewayapi/route.go 94.64% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6271      +/-   ##
==========================================
+ Coverage   70.88%   70.96%   +0.07%     
==========================================
  Files         220      220              
  Lines       37259    37400     +141     
==========================================
+ Hits        26412    26541     +129     
- Misses       9299     9308       +9     
- Partials     1548     1551       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch from 7198490 to 7943308 Compare June 8, 2025 06:43
@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jun 19, 2025

we discussed this in the community meeting yesterday and the preference is, instead of adding a new Type, this can be represented as a IPEndpoint

type IPEndpoint struct {
field called Ref or something similar whose type would be something like OverrideHostSource

@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch 3 times, most recently from a274521 to bfff78b Compare June 25, 2025 08:10
@Xunzhuo Xunzhuo changed the title feat: support static resolver eat: support host override based routing Jun 25, 2025
@Xunzhuo Xunzhuo requested review from arkodg and rudrakhp June 25, 2025 08:13
@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jun 25, 2025

Here is how it looks like now:

 apiVersion: gateway.envoyproxy.io/v1alpha1
  kind: Backend
  metadata:
    name: backend-routing-based-on-header
    namespace: default
  spec:
    type: HostOverride
    hostOverrideSettings:
      overrideHostSources:
      - header: target-pod

@Xunzhuo Xunzhuo changed the title eat: support host override based routing feat: support host override based routing Jun 25, 2025
@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch 2 times, most recently from bbd848f to 146e176 Compare June 27, 2025 17:33
Copy link
Copy Markdown
Member

@rudrakhp rudrakhp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments rest LGTM!

@Xunzhuo Xunzhuo requested a review from rudrakhp June 28, 2025 08:33
@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch 2 times, most recently from 0104551 to 9448b0d Compare June 28, 2025 08:56
@rudrakhp
Copy link
Copy Markdown
Member

@Xunzhuo can you fix the lint check (failing due to extra line)?

@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch from 9448b0d to fbdad46 Compare June 28, 2025 09:08
@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jun 28, 2025

@rudrakhp working on it, thanks!

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jun 28, 2025

PTAL @rudrakhp thanks!

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jun 28, 2025

/retest

rudrakhp
rudrakhp previously approved these changes Jun 28, 2025
Copy link
Copy Markdown
Member

@rudrakhp rudrakhp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudrakhp rudrakhp requested review from a team June 28, 2025 12:37
Copy link
Copy Markdown
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an e2e test to verify this works?

@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch 3 times, most recently from 514fe58 to f522dcc Compare June 30, 2025 06:26
@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jun 30, 2025

I am verifying the e2e test failures today, and had a discussion with @wbpcode, this current API looks not complete:

The HostOverride lbPolicy requires the fallback upstream, so the previous approach is not correct, should be done by adding at least one endpoint in Backend when type is HostOverride

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend-routing-based-on-header
  namespace: default
spec:
  type: HostOverride
  hostOverrideSettings:
    overrideHostSources:
    - header: target-pod
  endpoints:
    - fqdn:
        hostname: fallback.default.svc.cluster.local
        port: 80

This can be achieved another feature like, we do best-match for the pod in cluster, but if no best-match pod selected, use the endpoint in Backend to fallback, even it can be out-of-cluster.

@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch 2 times, most recently from e61cb4a to fea0cce Compare June 30, 2025 16:45
@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jul 1, 2025

@Xunzhuo if overrideHost LB Policy + Original Dst cluster dont work together in Envoy proxy, we'll have to use Option 1 #6234 (comment)

Xunzhuo added 2 commits July 3, 2025 10:54
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
@Xunzhuo Xunzhuo force-pushed the feat-static-resolver branch from fea0cce to a0d70fa Compare July 3, 2025 02:55
@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 3, 2025

how envoy impl the host override(source/extensions/load_balancing_policies/override_host/load_balancer.cc):

OverrideHostLoadBalancer::LoadBalancerImpl::chooseHost(LoadBalancerContext* context) {
  if (!context || !context->requestStreamInfo()) {
    // If there is no context or no request stream info, we can't use the
    // metadata, so we just return a host from the fallback picker.
    return fallback_picker_lb_->chooseHost(context);
  }

  OverrideHostFilterState* override_host_state = nullptr;
  if (override_host_state =
          context->requestStreamInfo()->filterState()->getDataMutable<OverrideHostFilterState>(
              OverrideHostFilterState::kFilterStateKey);
      override_host_state == nullptr) {
    auto state_ptr = std::make_shared<OverrideHostFilterState>(getSelectedHosts(context));
    override_host_state = state_ptr.get();

    context->requestStreamInfo()->filterState()->setData(
        OverrideHostFilterState::kFilterStateKey, std::move(state_ptr),
        StreamInfo::FilterState::StateType::Mutable);
  }

  if (override_host_state->empty()) {
    ENVOY_LOG(trace, "No overriden hosts were found. Using fallback LB policy.");
    return fallback_picker_lb_->chooseHost(context);
  }

  if (HostConstSharedPtr host = getEndpoint(*override_host_state); host != nullptr) {
    return {host};
  }

  // If some endpoints were found, but none of them are available in
  // the cluster endpoint set, or the number of retries in the retry policy
  // exceeds the number of fallback endpoints, then use to the fallback LB
  // policy.
  ENVOY_LOG(trace, "Failed to find any endpoints from metadata in the cluster. "
                   "Using fallback LB policy.");
  return fallback_picker_lb_->chooseHost(context);
}

forward to an endpoint which is not in the cluster(I manually add 3 endpoint to the cluster 10.96.93.71:8080, 10.96.166.130:8080, 10.96.82.175:8080):

[2025-07-03 03:23:36.542][35][debug][http] [source/common/http/conn_manager_impl.cc:1168] [Tags: "ConnectionId":"334","StreamId":"14294559941569293744"] request end stream timestamp recorded
[2025-07-03 03:23:36.542][35][debug][connection] [./source/common/network/connection_impl.h:99] [Tags: "ConnectionId":"334"] current connecting state: false
[2025-07-03 03:23:36.542][35][debug][router] [source/common/router/router.cc:542] [Tags: "ConnectionId":"334","StreamId":"14294559941569293744"] cluster 'httproute/gateway-conformance-infra/httproute-host-override-header/rule/0' match for URL '/header-override'
[2025-07-03 03:23:36.542][35][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:245] Selected endpoints: 10.244.0.22:3000
[2025-07-03 03:23:36.542][35][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:260] Looking up 10.244.0.22:3000 in 10.96.93.71:8080, 10.96.166.130:8080, 10.96.82.175:8080
[2025-07-03 03:23:36.542][35][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:281] Number of attempts has exceeded the number of override hosts.
[2025-07-03 03:23:36.542][35][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:182] Failed to find any endpoints from metadata in the cluster. Using fallback LB policy.
[2025-07-03 03:23:36.542][35][debug][router] [source/common/router/router.cc:802] [Tags: "ConnectionId":"334","StreamId":"14294559941569293744"] router decoding headers:
':authority', '172.18.0.203'
':path', '/header-override'
':method', 'GET'
':scheme', 'http'
'user-agent', 'curl/8.7.1'
'accept', '*/*'
'target-pod', '10.244.0.22:3000'
'x-forwarded-for', '172.18.0.1'
'x-forwarded-proto', 'http'
'x-envoy-external-address', '172.18.0.1'
'x-request-id', '1823981f-9ccf-4c1b-aa18-fc3f45f6fd2a'

forward to an endpoint which in the cluster:

[2025-07-03 03:21:30.301][22][debug][router] [source/common/router/router.cc:542] [Tags: "ConnectionId":"291","StreamId":"3232757434578693721"] cluster 'httproute/gateway-conformance-infra/httproute-host-override-header/rule/0' match for URL '/header-override'
[2025-07-03 03:21:30.301][22][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:245] Selected endpoints: 10.96.82.175:8080
[2025-07-03 03:21:30.301][22][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:260] Looking up 10.96.82.175:8080 in 10.96.93.71:8080, 10.96.166.130:8080, 10.96.82.175:8080
[2025-07-03 03:21:30.301][22][trace][upstream] [source/extensions/load_balancing_policies/override_host/load_balancer.cc:274] Selected endpoint: 10.96.82.175:8080
[2025-07-03 03:21:30.301][22][debug][router] [source/common/router/router.cc:802] [Tags: "ConnectionId":"291","StreamId":"3232757434578693721"] router decoding headers:
':authority', '172.18.0.203'
':path', '/header-override'
':method', 'GET'
':scheme', 'http'
'user-agent', 'curl/8.7.1'
'accept', '*/*'
'target-pod', '10.96.82.175:8080'
'x-forwarded-for', '172.18.0.1'
'x-forwarded-proto', 'http'
'x-envoy-external-address', '172.18.0.1'
'x-request-id', '2083f99f-e6c8-425f-9808-afb4d8ca0b94

In summary, if we want to pick an endpoint with the host override policy, it has to be in-range of the endpoints belong to the cluster, so we cannot use the dummy endpoint to hack, it will always go to the fallback policy.

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 3, 2025

The Backend API may only fit for the original dst implementation:

Header:

 apiVersion: gateway.envoyproxy.io/v1alpha1
  kind: Backend
  metadata:
    name: backend-routing-based-on-header
    namespace: default
  spec:
    type: OriginalDestination
      originalDestinationSetting:
         header: target-pod

Metadata:

 apiVersion: gateway.envoyproxy.io/v1alpha1
  kind: Backend
  metadata:
    name: backend-routing-based-on-header
    namespace: default
  spec:
    type: OriginalDestination
      originalDestinationSetting:
        metadata:
          key: "envoy.lb"
          path:
          - key: "override-host"

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jul 3, 2025

originalDestination is not a great name imo 🤷

@Xunzhuo
Copy link
Copy Markdown
Member Author

Xunzhuo commented Jul 3, 2025

@arkodg we need some naming genius to help us 😄

@Xunzhuo Xunzhuo changed the title feat: support host override based routing feat: support original dst based routing via Backend Jul 3, 2025
@Xunzhuo Xunzhuo changed the title feat: support original dst based routing via Backend feat: support original dst based routing Jul 3, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 2, 2025

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Aug 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support endpoint picker based on header/dynamic metadata

4 participants