Updating the header file with DNS rules during policy updates#33412
Updating the header file with DNS rules during policy updates#33412vipul-21 wants to merge 1 commit intocilium:mainfrom
Conversation
08e875a to
7b9de21
Compare
|
/test |
Signed-off-by: Vipul Singh <singhvipul@microsoft.com>
7b9de21 to
adc0402
Compare
christarazi
left a comment
There was a problem hiding this comment.
Thanks for the PR. I have a comment below on the approach.
| log.Infof("Updating the DNS rules for endpoint %d", dr.redirect.endpointID) | ||
| dr.redirect.localEndpoint.SyncEndpointHeaderFile() |
There was a problem hiding this comment.
SyncEndpointHeaderFile is already called within the endpoint pkg (pkg/endpoint) and during policy recalculation. Why is this necessary here especially expanding the scope of the proxy to reach into the management of endpoints?
There was a problem hiding this comment.
Hey sorry for the delay in response. So the intention here is to update the DNS rules to the ep_config.json when we apply the policy.
Afaik, it does not write the dns rules to the file when the policy is applied. So I was calling this function which is triggered when we want to write the dns rules to the file.
Current implementation, I see that the data is written to the fs after the policy applied, but dns rules are not. Is there another way we can incorporate getting the DNS rules to be written to the fs ?
There was a problem hiding this comment.
Current implementation, I see that the data is written to the fs after the policy applied, but dns rules are not. Is there another way we can incorporate getting the DNS rules to be written to the fs ?
I'm not sure I understand the problem you are describing. The DNS rules are in fact written to the fs as they are fetched from the same function:
cilium/pkg/endpoint/endpoint.go
Line 2438 in 325b5df
There was a problem hiding this comment.
Also another note to keep in mind here, see the comment regarding locks and deadlocks on the line above of the linked code: this PR might further complicate the locking pattern by now having the proxy reach into the headerfile.
There was a problem hiding this comment.
I agree; this is probably not necessary here. Rather, we should be syncing the endpoint header file explicitly on regeneration, which IIRC is not done now.
|
Ah, right, I remember what the problem is here: the cached DNS rules include the IP of the dns server, a.k.a the peer pod, which can change at any point in time, independent of the lifecycle of the subject endpoint. So, DNS rules can be out-of-date even when no endpoint-specific edges have been missed. This is a bummer. The solution, IMO, is to replace the Trigger with a Controller that runs periodically. That way we can be reasonably sure that the set of rules in question is up-to-date. |
So do you also recommend calling the controller when there are new rules being added Or solely rely on the periodic update ? I guess there can be a scenario where rules are applied but state in fs is not updated as controller is yet to run. I think that's what you meant by |
Writing the DNS rules to disk is ultimately best effort because it's done synchronously and if the pod is deleted in the meantime before the latest rules can be synced to disk, then
I'm still not sure I understand the problem being solved here. Basically, Therefore, I don't yet see the arguement for why we should sync DNS rules to the fs on policy calculation (endpoint regeneration) when the code already syncs the DNS rules when there's a DNS request made, once every 5s. |
There are two things cached in the endpoint directory for fast recovery on startup:
No. 1 cannot change without a DNS request, so it makes sense that this is updated per-request. However, no. 2 has a completely unrelated lifecycle. In fact, the lifecycle may be completely independent of the node itself, as CoreDNS being rescheduled on a remote node requires this to be updated. So, that's why we should periodically refresh the cached DNS information (a.k.a "write the header file"). Either we build a mechanism to somehow propagate dns rules selector updates (hard!), or we just periodically update endpoint cached DNS info. Make sense? |
That clarifies things a bit, thanks. What is the consequence if we don't update (2) on a periodic basis? I understand the rules are "out of date / out of sync" on the fs, but what is the actual consequence of them being out of date? Does the list of ports + DNS server pods IPs (coredns) never get updated in the endpoint caches? |
As of now there shouldn't be any consequence, but this was discussed in sig-policy meeting in June: https://docs.google.com/document/d/1p6LuzoKR55_HgQJTWFInwj2FlIeo7_uAFXYuXe_Lw7Q/edit related to HA DNS Proxy CFP. Will try to do the controller approach as suggested. |
That's exactly right. This is caching the set of allowed IPs so we can correctly serve proxied DNS requests while the agent is starting up. Since DNS rules are per port + destination, an endpoint could potentially have different DNS rules per destination. We also would like to be able to serve DNS requests very quickly when restarting the agent. Since a fresh agent will not know IP:Identity mappings, we cache IPs rather than identities. But then we never refresh the cache. |
|
This pull request has been automatically marked as stale because it |
|
@vipul-21 have you had a chance to look at this? It would be a nice fix to have! |
|
This pull request has been automatically marked as stale because it |
|
This pull request has not seen any activity since it was marked stale. |
|
@squeed I updated the branch with the controller, but not seeing an option to reopen this PR ? Should I create a new one ? |
Writing the DNS rules to the endpoint header/json file as we apply the policy changes, This keeps the
ep_config.jsonandep_config.hin sync with the current rules being applied the policy.