Updating the header file with DNS rules during policy updates by vipul-21 · Pull Request #33412 · cilium/cilium

vipul-21 · 2024-06-26T19:15:24Z

Writing the DNS rules to the endpoint header/json file as we apply the policy changes, This keeps the ep_config.json and ep_config.h in sync with the current rules being applied the policy.

vipul-21 · 2024-06-26T19:19:02Z

/test

Signed-off-by: Vipul Singh <singhvipul@microsoft.com>

christarazi

Thanks for the PR. I have a comment below on the approach.

christarazi · 2024-06-28T17:24:31Z

pkg/proxy/dns.go

+	log.Infof("Updating the DNS rules for endpoint %d", dr.redirect.endpointID)
+	dr.redirect.localEndpoint.SyncEndpointHeaderFile()


SyncEndpointHeaderFile is already called within the endpoint pkg (pkg/endpoint) and during policy recalculation. Why is this necessary here especially expanding the scope of the proxy to reach into the management of endpoints?

Hey sorry for the delay in response. So the intention here is to update the DNS rules to the ep_config.json when we apply the policy.
Afaik, it does not write the dns rules to the file when the policy is applied. So I was calling this function which is triggered when we want to write the dns rules to the file.
Current implementation, I see that the data is written to the fs after the policy applied, but dns rules are not. Is there another way we can incorporate getting the DNS rules to be written to the fs ?

Current implementation, I see that the data is written to the fs after the policy applied, but dns rules are not. Is there another way we can incorporate getting the DNS rules to be written to the fs ?

I'm not sure I understand the problem you are describing. The DNS rules are in fact written to the fs as they are fetched from the same function:

cilium/pkg/endpoint/endpoint.go

Line 2438 in 325b5df

rules := e.owner.GetDNSRules(e.ID)

Also another note to keep in mind here, see the comment regarding locks and deadlocks on the line above of the linked code: this PR might further complicate the locking pattern by now having the proxy reach into the headerfile.

I agree; this is probably not necessary here. Rather, we should be syncing the endpoint header file explicitly on regeneration, which IIRC is not done now.

squeed · 2024-07-12T09:19:04Z

Ah, right, I remember what the problem is here: the cached DNS rules include the IP of the dns server, a.k.a the peer pod, which can change at any point in time, independent of the lifecycle of the subject endpoint.

So, DNS rules can be out-of-date even when no endpoint-specific edges have been missed. This is a bummer.

The solution, IMO, is to replace the Trigger with a Controller that runs periodically. That way we can be reasonably sure that the set of rules in question is up-to-date.

vipul-21 · 2024-07-19T21:10:00Z

Ah, right, I remember what the problem is here: the cached DNS rules include the IP of the dns server, a.k.a the peer pod, which can change at any point in time, independent of the lifecycle of the subject endpoint.

So, DNS rules can be out-of-date even when no endpoint-specific edges have been missed. This is a bummer.

The solution, IMO, is to replace the Trigger with a Controller that runs periodically. That way we can be reasonably sure that the set of rules in question is up-to-date.

So do you also recommend calling the controller when there are new rules being added Or solely rely on the periodic update ? I guess there can be a scenario where rules are applied but state in fs is not updated as controller is yet to run. I think that's what you meant by reasonably sure part.
How frequent can/should the controller run ? Maybe have that configurable too.

christarazi · 2024-07-19T21:29:28Z

Ah, right, I remember what the problem is here: the cached DNS rules include the IP of the dns server, a.k.a the peer pod, which can change at any point in time, independent of the lifecycle of the subject endpoint.

So, DNS rules can be out-of-date even when no endpoint-specific edges have been missed. This is a bummer.

The solution, IMO, is to replace the Trigger with a Controller that runs periodically. That way we can be reasonably sure that the set of rules in question is up-to-date.

Writing the DNS rules to disk is ultimately best effort because it's done synchronously and if the pod is deleted in the meantime before the latest rules can be synced to disk, then

Ah, right, I remember what the problem is here: the cached DNS rules include the IP of the dns server, a.k.a the peer pod, which can change at any point in time, independent of the lifecycle of the subject endpoint.
So, DNS rules can be out-of-date even when no endpoint-specific edges have been missed. This is a bummer.
The solution, IMO, is to replace the Trigger with a Controller that runs periodically. That way we can be reasonably sure that the set of rules in question is up-to-date.

So do you also recommend calling the controller when there are new rules being added Or solely rely on the periodic update ? I guess there can be a scenario where rules are applied but state in fs is not updated as controller is yet to run. I think that's what you meant by reasonably sure part. How frequent can/should the controller run ? Maybe have that configurable too.

I'm still not sure I understand the problem being solved here.

Basically, SyncEndpointHeaderFile() is called each time the endpoint makes a DNS request. SyncEndpointHeaderFile() triggers code with a 5s interval meaning the sync to the fs only occurs once every 5s.

Therefore, I don't yet see the arguement for why we should sync DNS rules to the fs on policy calculation (endpoint regeneration) when the code already syncs the DNS rules when there's a DNS request made, once every 5s.

squeed · 2024-07-24T18:59:50Z

@christarazi @vipul-21

Therefore, I don't yet see the arguement for why we should sync DNS rules to the fs on policy calculation (endpoint regeneration) when the code already syncs the DNS rules when there's a DNS request made, once every 5s.

There are two things cached in the endpoint directory for fast recovery on startup:

The list of domain names + IPs seen by this endpoint
The list of ports + dns server pod IPs that are allowed

No. 1 cannot change without a DNS request, so it makes sense that this is updated per-request. However, no. 2 has a completely unrelated lifecycle. In fact, the lifecycle may be completely independent of the node itself, as CoreDNS being rescheduled on a remote node requires this to be updated.

So, that's why we should periodically refresh the cached DNS information (a.k.a "write the header file"). Either we build a mechanism to somehow propagate dns rules selector updates (hard!), or we just periodically update endpoint cached DNS info.

Make sense?

christarazi · 2024-07-24T20:02:20Z

@christarazi @vipul-21

Therefore, I don't yet see the arguement for why we should sync DNS rules to the fs on policy calculation (endpoint regeneration) when the code already syncs the DNS rules when there's a DNS request made, once every 5s.

There are two things cached in the endpoint directory for fast recovery on startup:
1. The list of domain names + IPs seen by this endpoint

2. The list of ports + dns server pod IPs that are allowed
No. 1 cannot change without a DNS request, so it makes sense that this is updated per-request. However, no. 2 has a completely unrelated lifecycle. In fact, the lifecycle may be completely independent of the node itself, as CoreDNS being rescheduled on a remote node requires this to be updated.

So, that's why we should periodically refresh the cached DNS information (a.k.a "write the header file"). Either we build a mechanism to somehow propagate dns rules selector updates (hard!), or we just periodically update endpoint cached DNS info.

Make sense?

That clarifies things a bit, thanks. What is the consequence if we don't update (2) on a periodic basis? I understand the rules are "out of date / out of sync" on the fs, but what is the actual consequence of them being out of date? Does the list of ports + DNS server pods IPs (coredns) never get updated in the endpoint caches?

vipul-21 · 2024-08-16T19:06:55Z

@christarazi

What is the consequence if we don't update (2) on a periodic basis?

As of now there shouldn't be any consequence, but this was discussed in sig-policy meeting in June: https://docs.google.com/document/d/1p6LuzoKR55_HgQJTWFInwj2FlIeo7_uAFXYuXe_Lw7Q/edit related to HA DNS Proxy CFP.

Need to determine L7 policy per-endpoint. Can we use epconfig.json to load per-endpoint policy, or do we need another discoverability mechanism?
This is not fast enough, but a PR to write epconfig.json on regeneration would be reasonable

Will try to do the controller approach as suggested.

squeed · 2024-08-19T11:20:43Z

@christarazi

What is the consequence if we don't update (2) on a periodic basis? [...] Does the list of ports + DNS server pods IPs (coredns) never get updated in the endpoint caches?

That's exactly right. This is caching the set of allowed IPs so we can correctly serve proxied DNS requests while the agent is starting up.

Since DNS rules are per port + destination, an endpoint could potentially have different DNS rules per destination. We also would like to be able to serve DNS requests very quickly when restarting the agent. Since a fresh agent will not know IP:Identity mappings, we cache IPs rather than identities. But then we never refresh the cache.

github-actions · 2024-09-19T01:57:41Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

squeed · 2024-09-19T07:35:19Z

@vipul-21 have you had a chance to look at this? It would be a nice fix to have!

github-actions · 2024-10-20T02:04:26Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

github-actions · 2024-11-04T02:00:53Z

This pull request has not seen any activity since it was marked stale.
Closing.

vipul-21 · 2024-11-15T20:29:38Z

@vipul-21 have you had a chance to look at this? It would be a nice fix to have!

@squeed Sorry for the delay.(busy with some other stuff).
Yes, let me update this.

vipul-21 · 2024-11-27T18:35:44Z

@squeed I updated the branch with the controller, but not seeing an option to reopen this PR ? Should I create a new one ?
vipul-21@a76b698

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 26, 2024

github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Jun 26, 2024

vipul-21 force-pushed the singhvipul/dnsrules branch from 08e875a to 7b9de21 Compare June 26, 2024 19:16

Updating the header file with DNS rules during policy update

adc0402

Signed-off-by: Vipul Singh <singhvipul@microsoft.com>

vipul-21 force-pushed the singhvipul/dnsrules branch from 7b9de21 to adc0402 Compare June 26, 2024 20:54

vipul-21 marked this pull request as ready for review June 26, 2024 22:39

vipul-21 requested review from a team as code owners June 26, 2024 22:39

vipul-21 requested review from pippolo84 and sayboras June 26, 2024 22:39

christarazi reviewed Jun 28, 2024

View reviewed changes

hemanthmalla mentioned this pull request Jul 9, 2024

cfp: Adding the options to get the DNS rules hemanthmalla/design-cfps#2

Merged

squeed self-requested a review July 9, 2024 15:38

christarazi marked this pull request as draft July 19, 2024 21:29

hemanthmalla mentioned this pull request Aug 12, 2024

CFP-30984 : Add CFP for DNS proxy HA cilium/design-cfps#32

Closed

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 19, 2024

github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 20, 2024

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 20, 2024

github-actions bot closed this Nov 4, 2024

vipul-21 mentioned this pull request Jan 6, 2025

Updating the header file with DNS rules during policy update #36851

Closed

		log.Infof("Updating the DNS rules for endpoint %d", dr.redirect.endpointID)
		dr.redirect.localEndpoint.SyncEndpointHeaderFile()

Conversation

vipul-21 commented Jun 26, 2024

Uh oh!

vipul-21 commented Jun 26, 2024

Uh oh!

christarazi left a comment

Choose a reason for hiding this comment

Uh oh!

christarazi Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

vipul-21 Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

christarazi Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

christarazi Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

squeed Jul 12, 2024

Choose a reason for hiding this comment

Uh oh!

squeed commented Jul 12, 2024

Uh oh!

vipul-21 commented Jul 19, 2024

Uh oh!

christarazi commented Jul 19, 2024

Uh oh!

squeed commented Jul 24, 2024

Uh oh!

christarazi commented Jul 24, 2024

Uh oh!

vipul-21 commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squeed commented Aug 19, 2024

Uh oh!

github-actions bot commented Sep 19, 2024

Uh oh!

squeed commented Sep 19, 2024

Uh oh!

github-actions bot commented Oct 20, 2024

Uh oh!

github-actions bot commented Nov 4, 2024

Uh oh!

vipul-21 commented Nov 15, 2024

Uh oh!

vipul-21 commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vipul-21 commented Aug 16, 2024 •

edited

Loading

vipul-21 commented Nov 27, 2024 •

edited

Loading