Node Request Certificates require to have IPs#125813
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| // when pods uses hostNetwork: true, as PodIPs. | ||
| // Nodes that use IP addresses as Hostname are interpreted as an IP | ||
| // address, so it is possible that are nodes that don't have any DNS name. | ||
| if len(ips) == 0 { |
There was a problem hiding this comment.
I'm not sure about this bit. DNS-only entries could be valid if that's what the external provider sets
There was a problem hiding this comment.
it can be valid, but Kubernetes has a strong requirement on have at least one IP, as Pods depend on them, so my thinking is that a Node will need always at least one IP ... hence, any state without IPs on the node.status.addresses is a transitional state
There was a problem hiding this comment.
the point of discussion here is when is really ok to request a certificate, do we want to request a certificate for a node that does not have any IP address reported?
There was a problem hiding this comment.
I think maybe, because reporting ≠ having. Maybe the node is somehow multihomed but any connection based on a resolved IP address will stay up and working through magic technology.
There was a problem hiding this comment.
This is a specific problem with external cloud providers that when the initialize the node object the set only the hostname, after that the cloud provider process the node and updates with the IP addresses.
The problem is that , if the node apporver does not consider valid a node with only DNS names and not with IP, there is a impact on node startup because nodes generate a first certificate request that is incomplete
| if len(hostnames) == 0 && len(ips) == 0 { | ||
| // don't return a template if we have no addresses to request for. | ||
| // A Kubernetes Node requires to have at minimum one IP address | ||
| // because those are used on the Pods field HostIPs and in some cases, |
There was a problem hiding this comment.
it uses the IP from node.status.addresses as pod hostIP?
There was a problem hiding this comment.
yes, see https://docs.google.com/document/d/1mqdVLQHIYjrzjy8Hq-FMysthHRL7ht0guk2uWI0FxKM/edit#heading=h.97ewrhxwu05m fore more detailed explanation
There was a problem hiding this comment.
api docs
kubernetes/staging/src/k8s.io/api/core/v1/types.go
Lines 6148 to 6164 in 2ad04a0
|
Changelog suggestion -kubelet, when configured to rotate certificates, will only request certificates when there is at least one IP address on the node.status.addresses
+Changed the way the kubelet requests certificates during rotation.
+The kubelet now only requests signing for a new certificate when there is at least one IP address in the
+`.status.addresses` of its associated Node object. |
I assume the word "taint" is missing there? (Or are we removing the external cloud providers now that we've gotten rid of the legacy ones 😂) PR makes sense to me. |
lol, yes, taint is indeed missing |
|
This change is also consistent with the API documentation for node addresses, that require at least one internalIP address, but does not require a hostname address kubernetes/staging/src/k8s.io/api/core/v1/types.go Lines 6148 to 6164 in 2ad04a0 |
|
I'm tentatively ok with this, given the way kubelet uses / requires an address of type IP for pod hostIP to work properly. My read of kubelet address self-reporting is that in non-cloud envs, the kubelet always reports an IP (either detected or overridden with --node-ip), so this is scoped to external cloud providers. A couple thoughts on reducing risk of this interacting weirdly with someone's setup:
|
neolit123
left a comment
There was a problem hiding this comment.
the change seems ok to me, but is there a case where the cloud provider would only set an IP address in the status? if so, there should be a test case for that too.
seems like NewKubeletServerCertificateManager is not unit tested in terms of this hostname and IP validation.
only an IP is fine... that would have worked before, that would work with this PR as well (but it's good to have a unit test for it) |
i can imagine scenarios like that - i.e. someone using the hostname to access the kubelet server and never using the IP.
agreed that this seems unlikely.
seems better to have the opt-out if immediate breakages happen then users need to opt-out until they can adapt their setups. |
If they report a DNS name, the kubelet will still include it in cert requests, and they can still use it as before. What would not work is if they only told the kubelet its DNS name, told it it had 0 IP addresses, and still expected kubelet to request serving certificates with just the DNS names and no IPs. |
|
added a feature gate to be able to opt-in for the old behavior, I also had to refactor a bit the code to be able to unit test it |
/test pull-kubernetes-e2e-gce |
| }, | ||
| ) | ||
| legacyregistry.MustRegister(certificateRotationAge) | ||
| registerMetricsOnce.Do(func() { |
There was a problem hiding this comment.
hmm, if newCertificateManagerConfig is called more than once, the returned configs will have metrics that aren't actually registered or reporting anywhere
before, MustRegister would panic on this... now, we just silently return not-working configs
if we're going to let this be called multiple times, I think the certificateRenewFailure / certificateRotationAge have to become shared package vars that lazily init / register once on first call
There was a problem hiding this comment.
the returned configs will have metrics that aren't actually registered or reporting anywhere
metrics are global and you can only register once, so if is called multiple times the metrics will still apply
before, MustRegister would panic on this... now, we just silently return not-working configs
why will be not-working?
if we're going to let this be called multiple times, I think the certificateRenewFailure / certificateRotationAge have to become shared package vars that lazily init / register once on first call
I started with that but the diff was larger( thinking on backports)
There was a problem hiding this comment.
hmm, I didn't realize there is another certficiate.Config below 🤔
| // don't return a template if we have no addresses to request for | ||
| if len(hostnames) == 0 && len(ips) == 0 { | ||
| return nil | ||
| // don't return a template if we have no IP addresses or DNS names to request for |
There was a problem hiding this comment.
I don't know why, but I'm finding the combination of the nesting and the feature gate name really hard to follow. Would something like this be clearer?
// by default, require at least one IP before requesting a serving certificate
hasRequiredAddresses := len(ips) > 0
// optionally allow requesting a serving certificate with just a DNS name
if utilfeature.DefaultFeatureGate.Enabled(features.AllowDNSOnlyNodeCSR) {
hasRequiredAddresses = hasRequiredAddresses || len(hostnames) > 0
}
// don't return a template if we have no addresses to request for
if !hasRequiredAddresses {
return nil
}There was a problem hiding this comment.
is hard to follow but cleaner when we remove the feature gate, the diff is simpler
There was a problem hiding this comment.
is hard to follow but it is cleaner, so when we remove the feature gate, the diff is simpler
There was a problem hiding this comment.
will update with your suggestion
A Kubernetes Node requires to have at minimum one IP address because those are used on the Pods field HostIPs and in some cases, when pods uses hostNetwork: true, as PodIPs. Nodes that use IP addresses as Hostname are interpreted as an IP address, so it is possible that are nodes that don't hane any DNSname. The feature gate AllowDNSOnlyNodeCSR will allow user to opt-in for the old behavior. Change-Id: I094531d87246f1e7a5ef4fe57bd5d9840cb1375d
|
@liggitt the config generator was the wrong abstraction as it left the code worse than is today. |
does not look related /test pull-kubernetes-integration |
|
|
LGTM label has been added. DetailsGit tree hash: 32ef1270c3d13c841b81c2e237249aa3d0cf6df9 |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind bug
What this PR does / why we need it:
Before #121028 the hostname was always added to the Node object by the kubelet as address type
v1.NodeHostName.Post #121028 the kubelet waits for the external cloud provider to set all the Node addresses,
v1.NodeHostNameincluded.#121028 caused an issue with some scenarios where the users want to override the hostname, and was fixed by #124516.
Post #124516 the kubelet initialize the Node object with the obtained hostname during the bootstrap, it can be overridden later by the cloud provider (unit test asserting this behavior in #125809)
Since now the Node object already contains at least one IP or DNSName, depending of the value of the hostname, the kubelet, when using the certificate manager, always requests a certificate at bootstrap with incomplete information, as it still may have to wait for the external cloud provider to populate the node addresses.
Make the condition to request the kubelet certificate to have at least one IP addresses, instead of having one IP or DNSName, solve the problem of kubelet requesting certificates too early.
An alternative is to wait for the external cloud provider taint to be removed
/sig node
/sig auth
/assign @danwinship @liggitt