Skip to content

support dns proxy#1470

Merged
kmesh-bot merged 7 commits intokmesh-net:mainfrom
Kuromesi:dns
Sep 1, 2025
Merged

support dns proxy#1470
kmesh-bot merged 7 commits intokmesh-net:mainfrom
Kuromesi:dns

Conversation

@Kuromesi
Copy link
Copy Markdown
Contributor

@Kuromesi Kuromesi commented Aug 4, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This experimental feature is controlled by env KMESH_ENABLE_DNS_PROXY, defaults to false. When dns proxy is enabled, a dns server will be set up in kmesh daemon and all traffic with destination port 53 will be redirected to which.

This feature can be illustrated and validated by the following example:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: httpbin-route
spec:
  hosts:
    - "kmesh-fake.com"
  gateways:
    - mesh
  http:
    - name: "route-to-httpbin"
      match:
        - uri:
            prefix: "/"
          port: 80
      route:
        - destination:
            host: httpbin.default.svc.cluster.local
            port:
              number: 8000
---
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: example
spec:
  exportTo:
  - '*'
  hosts:
  - kmesh-fake.com
  ports:
  - name: http
    number: 80
    protocol: HTTP
  resolution: DNS

Without dns proxy, traffic to kmesh-fake.com will fail with could not resolve host error. When dns proxy is enabled, host can be resolved and traffic policies will be applied.

kubectl exec -it deploy/sleep -- curl kmesh-fake.com -v -I
* Host kmesh-fake.com:80 was resolved.
* IPv6: 2001:2::6
* IPv4: 240.240.0.6
*   Trying [2001:2::6]:80...
* Immediate connect fail for 2001:2::6: Network unreachable
*   Trying 240.240.0.6:80...
* Connected to kmesh-fake.com (240.240.0.6) port 80
* using HTTP/1.x
> HEAD / HTTP/1.1
> Host: kmesh-fake.com
> User-Agent: curl/8.13.0
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< server: envoy
server: envoy
< date: Mon, 04 Aug 2025 15:39:10 GMT
date: Mon, 04 Aug 2025 15:39:10 GMT
< content-type: text/html; charset=utf-8
content-type: text/html; charset=utf-8
< content-length: 9593
content-length: 9593
< access-control-allow-origin: *
access-control-allow-origin: *
< access-control-allow-credentials: true
access-control-allow-credentials: true
< x-envoy-upstream-service-time: 1
x-envoy-upstream-service-time: 1
< x-envoy-decorator-operation: httpbin.default.svc.cluster.local:8000/*
x-envoy-decorator-operation: httpbin.default.svc.cluster.local:8000/*
< 

* Connection #0 to host kmesh-fake.com left intact

Which issue(s) this PR fixes:
This should resolve #1459.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Kuromesi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented an experimental DNS proxy feature for Kmesh. This enhancement allows Kmesh to act as a DNS server, intercepting and resolving DNS queries for services, especially those managed by Istio ServiceEntry configurations. By redirecting DNS traffic to the Kmesh daemon, we can ensure that traffic policies are correctly applied to hostnames that might otherwise be unresolvable. This provides a more comprehensive traffic management solution within the mesh.

Highlights

  • Experimental DNS Proxy: I've introduced an experimental DNS proxy feature within the Kmesh daemon. This allows Kmesh to intercept and handle DNS queries for services, particularly those defined via Istio ServiceEntry resources.
  • DNS Traffic Redirection via eBPF: When enabled, all traffic destined for port 53 (DNS) will be redirected to the Kmesh daemon's internal DNS server. This redirection is managed by new eBPF programs attached to cgroups.
  • Enhanced Service Entry Resolution: The proxy enables hostnames, such as those in ServiceEntry definitions, to be resolved by Kmesh. This is crucial for applying traffic policies to services that are not natively discoverable via Kubernetes DNS.
  • Configurable Feature Flag: The feature is controlled by the KMESH_ENABLE_DNS_PROXY environment variable, defaulting to false. This allows for a controlled rollout and testing of the new functionality.
  • Modular DNS Resolution Architecture: I've refactored the DNS resolution logic to be more modular, introducing Resolver interfaces and delegate resolvers. This allows for flexible chaining of resolution mechanisms, including a new ServiceCacheResolver that uses Kmesh's internal service cache and an UpstreamResolver for external queries.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A hostname sought, a query sent, Through eBPF's path, its journey bent. To Kmesh it flies, a proxy's art, New addresses found, a fresh start.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces DNS proxy support as an experimental feature. The changes are extensive, touching BPF programs, Go controller logic, and adding a new DNS server implementation. My review focuses on correctness and maintainability. I've identified a critical bug in the BPF program attachment logic during restart, fragile logic in the DNS service cache resolver, and some code duplication and dead code that should be addressed.

Comment on lines +60 to +72
case dns.TypeA:
ip := net.IP(svc.GetAddresses()[0].Address)
if ip == nil || ip.To4() == nil {
log.Errorf("Invalid IPv4 address in records for %s: %s", q.Name, ip.String())
continue
}

rr, err := dns.NewRR(fmt.Sprintf("%s A %s", q.Name, ip.String()))
if err != nil {
log.Errorf("Error creating A record: %v", err)
continue
}
m.Answer = append(m.Answer, rr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for creating an A record assumes the IPv4 address is always at index [0] of the address slice. This is not a safe assumption and can lead to incorrect behavior or panics if the address list is empty, in a different order, or doesn't contain an IPv4 address at that position. You should iterate through the addresses to find a valid IPv4 address.

                case dns.TypeA:
                        var ip net.IP
                        for _, addr := range svc.GetAddresses() {
                                maybeIP := net.IP(addr.Address)
                                if maybeIP != nil && maybeIP.To4() != nil {
                                        ip = maybeIP
                                        break
                                }
                        }

                        if ip == nil {
                                log.Errorf("No valid IPv4 address in records for %s", q.Name)
                                continue
                        }

                        rr, err := dns.NewRR(fmt.Sprintf("%s A %s", q.Name, ip.String()))
                        if err != nil {
                                log.Errorf("Error creating A record: %v", err)
                                continue
                        }
                        m.Answer = append(m.Answer, rr)

Comment on lines +73 to +90
case dns.TypeAAAA:
if len(svc.GetAddresses()) < 2 {
log.Infof("no ipv6 address found")
continue
}
ip := net.IP(svc.GetAddresses()[1].Address)
if ip == nil || ip.To16() == nil {
log.Errorf("Invalid IP address: %s", ip.String())
continue
}

rr, err := dns.NewRR(fmt.Sprintf("%s %d IN AAAA %s", q.Name, 60, ip.String()))
if err != nil {
log.Errorf("Error creating AAAA record: %v", err)
continue
}
m.Answer = append(m.Answer, rr)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for creating an AAAA record is brittle. It checks if there are at least 2 addresses and assumes the one at index [1] is IPv6. This is not robust and will fail if the addresses are in a different order or if only an IPv6 address is present. You should iterate through the addresses to find a valid IPv6 address that is not an IPv4-mapped IPv6 address.

                case dns.TypeAAAA:
                        var ip net.IP
                        for _, addr := range svc.GetAddresses() {
                                maybeIP := net.IP(addr.Address)
                                if maybeIP != nil && maybeIP.To16() != nil && maybeIP.To4() == nil {
                                        ip = maybeIP
                                        break
                                }
                        }

                        if ip == nil {
                                log.Infof("no ipv6 address found for %s", q.Name)
                                continue
                        }

                        rr, err := dns.NewRR(fmt.Sprintf("%s %d IN AAAA %s", q.Name, 60, ip.String()))
                        if err != nil {
                                log.Errorf("Error creating AAAA record: %v", err)
                                continue
                        }
                        m.Answer = append(m.Answer, rr)

Signed-off-by: Kuromesi <blackfacepan@163.com>
Signed-off-by: Kuromesi <blackfacepan@163.com>
}

ctx->user_ip4 = backend_v->addr.ip4;
ctx->user_port = bpf_htons(53);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this redundant

return CGROUP_SOCK_OK;
}

ctx->user_ip4 = storage->sk_tuple.ipv4.daddr;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It rewriting the dst address a must in udp?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you import or refer istio dns proxy implemention

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LG, LMK if you are gonna to optimize or later

Signed-off-by: Kuromesi <blackfacepan@163.com>
@Kuromesi
Copy link
Copy Markdown
Contributor Author

Could you please help me review the latest commit which implement the istio dns server? @hzxuzhonghu @LiZhenCheng9527
I'll add ut and e2e tests after we reach an agreement about the implementation.

Signed-off-by: Kuromesi <blackfacepan@163.com>
Signed-off-by: Kuromesi <blackfacepan@163.com>
@Kuromesi Kuromesi changed the title [WIP] support dns proxy support dns proxy Aug 18, 2025
kdns "kmesh.net/kmesh/pkg/dns"

service_discovery_v3 "github.com/envoyproxy/go-control-plane/envoy/service/discovery/v3"
dnsClient "istio.io/istio/pkg/dns/client"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: camel case is not common in go import

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll fix this! (This is copied from istio btw)

}
}

// THIS FUNC IS MODIFIED BASED ON istio.io/istio/pkg/dns/server/name_table.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

hzxuzhonghu
hzxuzhonghu previously approved these changes Aug 25, 2025
@hzxuzhonghu
Copy link
Copy Markdown
Member

/retest

Signed-off-by: Kuromesi <blackfacepan@163.com>
@Kuromesi
Copy link
Copy Markdown
Contributor Author

I'll check what's wrong with the e2e

@hzxuzhonghu
Copy link
Copy Markdown
Member

seems related

=== RUN   TestKmeshRestart
2025-08-25T02:18:10.627157Z	info	tf	=== BEGIN: Test: '_home_runner_work_kmesh_kmesh_test_e2e[TestKmeshRestart]' ===
Logs for Pod: kmesh-fsww9
2025-08-25T02:18:16.649939Z	info	tf	Checking pods ready...
2025-08-25T02:18:16.652509Z	info	tf	  [ 0]                                   kmesh-fsww9         Running (Ready)
2025-08-25T02:18:16.652536Z	info	tf	  [ 1]                                   kmesh-ftbwk         Running (Ready)
    restart_test.go:65: Minimum success threshold, 1.000000, was not met. 100/101 (0.990099) requests failed: 1 error occurred:
        	* request 101: 2 errors occurred:
        	* failed calling enrolled-to-kmesh (cluster=cluster-0)->'[http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/)': call failed from enrolled-to-kmesh (cluster=cluster-0) to [http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/) (using http): expected no error, but encountered rpc error: code = Unknown desc = 1/1 requests had errors; first error: Get "[http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/)": dial tcp: lookup service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local: i/o timeout
        	* failed calling enrolled-to-kmesh (cluster=cluster-0)->'[http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/)': call failed from enrolled-to-kmesh (cluster=cluster-0) to [http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/) (using http): expected no error, but encountered rpc error: code = Unknown desc = 1/1 requests had errors; first error: Get "[http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local:80](http://service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local/)": dial tcp: lookup service-with-waypoint-at-service-granularity.echo-1-22616.svc.cluster.local: i/o timeout

@Kuromesi Kuromesi requested a review from hzxuzhonghu August 31, 2025 15:14
@codecov
Copy link
Copy Markdown

codecov bot commented Aug 31, 2025

Codecov Report

❌ Patch coverage is 38.18898% with 157 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@aa23e55). Learn more about missing BASE report.
⚠️ Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
pkg/bpf/workload/sock_connection.go 0.00% 43 Missing ⚠️
pkg/controller/controller.go 0.00% 38 Missing ⚠️
pkg/controller/workload/workload_processor.go 5.55% 33 Missing and 1 partial ⚠️
pkg/dns/nametable.go 69.87% 20 Missing and 5 partials ⚠️
pkg/controller/workload/workload_controller.go 0.00% 8 Missing ⚠️
pkg/dns/upstream.go 75.00% 5 Missing and 2 partials ⚠️
pkg/dns/dns.go 88.88% 1 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
pkg/dns/dns.go 52.02% <88.88%> (ø)
pkg/dns/upstream.go 75.00% <75.00%> (ø)
pkg/controller/workload/workload_controller.go 36.84% <0.00%> (ø)
pkg/dns/nametable.go 69.87% <69.87%> (ø)
pkg/controller/workload/workload_processor.go 58.69% <5.55%> (ø)
pkg/controller/controller.go 0.00% <0.00%> (ø)
pkg/bpf/workload/sock_connection.go 0.00% <0.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa23e55...3b9e299. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hzxuzhonghu
Copy link
Copy Markdown
Member

Cool, will have a last look

Signed-off-by: Kuromesi <blackfacepan@163.com>
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you !

@kmesh-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit 414ea94 into kmesh-net:main Sep 1, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KMesh support for ServiceEntry virtural domain resolution and traffic interception, similar to Sidecar DNS capture

3 participants