Skip to content

Fix waypoint address missing in ebpf/endpoints when LocalityInfo is nil#1534

Merged
kmesh-bot merged 5 commits intomainfrom
copilot/fix-waypoint-address-bug
Nov 28, 2025
Merged

Fix waypoint address missing in ebpf/endpoints when LocalityInfo is nil#1534
kmesh-bot merged 5 commits intomainfrom
copilot/fix-waypoint-address-bug

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 27, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Waypoint endpoints were missing from eBPF maps when LocalityInfo was nil, causing "Connection reset by peer" errors.

Root cause: In handleWorkloadNewBoundServices, locality mode (FAILOVER, the waypoint default) required LocalityInfo != nil to add workloads to endpoints. Since LocalityInfo is only set when a local node workload is processed, waypoint workloads arriving first were silently dropped.

Fix:

  • Remove LocalityInfo != nil gate from the endpoint addition logic
  • When LocalityInfo is nil, assign highest priority (0)
  • Endpoint gets added immediately; priority recalculates when locality initializes
// Before: workload silently dropped if LocalityInfo == nil
if p.locality.LocalityInfo != nil && service != nil {
    prio := p.locality.CalcLocalityLBPrio(...)
}

// After: workload added with highest priority
if service != nil {
    var prio uint32
    if p.locality.LocalityInfo != nil {
        prio = p.locality.CalcLocalityLBPrio(...)
    } else {
        prio = 0
    }
}

Which issue(s) this PR fixes:
Fixes #1533

Special notes for your reviewer:

Test added: TestLocalityLBWithNilLocalityInfo validates workloads from other nodes are added to endpoints even when locality is uninitialized.

Does this PR introduce a user-facing change?:

Fixed waypoint connection failures caused by missing endpoints in eBPF maps when locality information was not yet initialized
Original prompt

This section details on the original issue you should resolve

<issue_title>waypoint address missing in ebpf/endpoints</issue_title>
<issue_description>

What happened:
The client service got "Connection reset by peer" error log.

waypoint log:

envoy connection external/envoy/source/common/tls/ssl_socket.cc:248        [Tags: "ConnectionId":"731052"] remote address:10.119.127.26:47892,TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TLS_error_end        thread=21

We found that the epbf connect to 15008 of waypoint.

After check kmesh's ebpf dump API

kubectl exec -ti {kmesh-pod} -- curl http://localhost:15200/debug/config_dump/bpf/dual-engine

waypoint backendUid missed in the endpoints.

Finally, the bug could be here:

https://github.com/kmesh-net/kmesh/blob/main/pkg/controller/workload/workload_processor.go#L565-L567

Image

https://github.com/kmesh-net/kmesh/blob/main/pkg/controller/workload/workload_processor.go#L452-L466

Image

waypoint' LB policy default be FAILOVER. kmesh-daemonset may not initialize the locality
</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

When locality LB mode is enabled (e.g., FAILOVER mode used by waypoints),
the handleWorkloadNewBoundServices function was not adding workloads to
service endpoints if p.locality.LocalityInfo was nil. This caused waypoint
endpoints to be missing from the eBPF maps when:
1. No local workload had been processed yet
2. Waypoint workloads arrived before any local workload

The fix ensures workloads are added with the lowest priority (len of
routing preferences) when LocalityInfo is nil, ensuring the endpoint
is added and will be recalculated when locality is eventually set.

Co-authored-by: hzxuzhonghu <13374016+hzxuzhonghu@users.noreply.github.com>
@kmesh-bot kmesh-bot added size/M and removed size/XS labels Nov 27, 2025
Copilot AI and others added 2 commits November 27, 2025 09:40
Co-authored-by: hzxuzhonghu <13374016+hzxuzhonghu@users.noreply.github.com>
Co-authored-by: hzxuzhonghu <13374016+hzxuzhonghu@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix missing waypoint address in ebpf endpoints Fix waypoint address missing in ebpf/endpoints when LocalityInfo is nil Nov 27, 2025
Copilot AI requested a review from hzxuzhonghu November 27, 2025 09:43
@hzxuzhonghu
Copy link
Copy Markdown
Member

Awesome work

@hzxuzhonghu
Copy link
Copy Markdown
Member

cc @imuxin

// If locality is not yet initialized, use lowest priority (len of routing preferences)
// This ensures the workload is added to the endpoint map and will be recalculated
// when locality is eventually set
prio = uint32(len(service.LoadBalancing.GetRoutingPreference()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please use highest priority 0 here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7500f9e. Changed to use highest priority 0 when LocalityInfo is nil.

Changed default priority from len(routingPreferences) to 0 as requested
by reviewer. This ensures workloads are assigned the highest priority
when locality information is not yet available.

Co-authored-by: hzxuzhonghu <13374016+hzxuzhonghu@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where waypoint addresses were missing from eBPF endpoint maps when using locality-based load balancing (like FAILOVER mode, which is the default for waypoints). The issue occurred when waypoint workloads arrived before any local workload was processed, leaving LocalityInfo uninitialized (nil).

Key changes:

  • Modified handleWorkloadNewBoundServices to check if LocalityInfo is nil before calling CalcLocalityLBPrio
  • When LocalityInfo is nil, workloads are now assigned the highest priority (0) to ensure immediate addition to eBPF maps
  • Added comprehensive test TestLocalityLBWithNilLocalityInfo to validate the fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/controller/workload/workload_processor.go Added nil check for LocalityInfo before calculating locality-based priority; assigns highest priority (0) when nil to ensure workloads are added to endpoint maps immediately
pkg/controller/workload/workload_processor_test.go Added test case validating that workloads are correctly added to service endpoints with priority 0 when LocalityInfo is nil in locality LB mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +467 to +468
// This ensures the workload is added to the endpoint map and will be recalculated
// when locality is eventually set
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment states that the endpoint "will be recalculated when locality is eventually set," but there's a TODO comment at line 572 in the same file indicating that endpoint priority recalculation upon locality initialization is not yet implemented. Consider updating this comment to be more accurate, such as:

// If locality is not yet initialized, use highest priority (0)
// This ensures the workload is added to the endpoint map immediately
// TODO: Priority should be recalculated when locality is eventually set (see line 572)
Suggested change
// This ensures the workload is added to the endpoint map and will be recalculated
// when locality is eventually set
// This ensures the workload is added to the endpoint map immediately
// TODO: Priority should be recalculated when locality is eventually set (see line 572)

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.85%. Comparing base (f115d74) to head (7500f9e).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
pkg/controller/workload/workload_processor.go 50.00% 1 Missing and 2 partials ⚠️
Files with missing lines Coverage Δ
pkg/controller/workload/workload_processor.go 58.20% <50.00%> (+0.40%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c2b4d1...7500f9e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kmesh-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hzxuzhonghu
Copy link
Copy Markdown
Member

/lgtm

@kmesh-bot kmesh-bot added the lgtm label Nov 28, 2025
@hzxuzhonghu
Copy link
Copy Markdown
Member

/retest

@kmesh-bot kmesh-bot merged commit 3f0e1ae into main Nov 28, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

waypoint address missing in ebpf/endpoints

4 participants