Skip to content

Kubelet creates and manages node leases#66257

Merged
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mtaufen:node-lease
Aug 27, 2018
Merged

Kubelet creates and manages node leases#66257
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mtaufen:node-lease

Conversation

@mtaufen
Copy link
Copy Markdown
Contributor

@mtaufen mtaufen commented Jul 16, 2018

This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on KEP-0009,
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.

  • add NodeLease feature gate
  • add kube-node-lease system namespace for node leases
  • add Kubelet option for lease duration
  • add Kubelet-internal lease controller to create and update lease
  • add e2e test for NodeLease feature

I would like to determine a standard policy for lease renewal frequency, based on the configured lease duration, so that we don't need to expose frequency as an additional knob. The renew interval is currently calculated as 1/3 of the lease duration.

kubelet: Users can now enable the alpha NodeLease feature gate to have the Kubelet create and periodically renew a Lease in the kube-node-lease namespace. The lease duration defaults to 40s, and can be configured via the kubelet.config.k8s.io/v1beta1.KubeletConfiguration's NodeLeaseDurationSeconds field.

/cc @wojtek-t @liggitt

@mtaufen mtaufen added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. kind/feature Categorizes issue or PR as related to a new feature. status/approved-for-milestone labels Jul 16, 2018
@mtaufen mtaufen added this to the v1.12 milestone Jul 16, 2018
@k8s-ci-robot k8s-ci-robot requested a review from liggitt July 16, 2018 23:25
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 16, 2018
@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t July 16, 2018 23:25
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 16, 2018
@k8s-github-robot
Copy link
Copy Markdown

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@mtaufen @yujuhong

Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we calculate the sleep duration first, we can log the sleep time in the error message which is sometimes useful to see how long something is in a backoff loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if the feature is enabled

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we only put leases in this namespace? would this be a logical place to put configmaps with node config as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever name is selected here needs to be communicated well in advance, so cluster operators can reserve the namespace for system use. it does raise an interesting question about what namespaces the kube system components are entitled to claim in the future. anything prefixed with kube-? anything prefixed with kube-system?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this doesn't really belong in meta/v1. it's fundamental to kube-apiserver and kubelet, but not to apimachinery (same really goes for kube-system and kube-public, but we can deal with those separately)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-node would make sense to store node configmaps and leases. Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we only put leases in this namespace?

My understanding is just leases. This is what the KEP says:

for each Node there will be a corresponding Lease object with Name equal to Node name in a newly created dedicated namespace (we considered using kube-system namespace but decided that it's already too overloaded). That namespace should be created automatically (similarly to "default" and "kube-system", probably by NodeController) and never be deleted (so that nodes don't require permission for it).

Which implies you just use it for node leases, because not prefixing the lease names means you can't cleanly add any other type of lease to that namespace in the future.

needs to be communicated well in advance

My understanding is that we had already reserved the kube- prefix for namespaces? Maybe this isn't true? @thockin do you know?

this doesn't really belong in meta/v1

That's fair, but then where should we put the constants for standard system namespaces? Multiple consumers need them, so they should be in a common place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just core/v1?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the others are already in pkg/apis/core/types.go... there or in core/v1/types.go is better than meta/v1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @bgrant0607 about kube-, given that this is the convention we think we should just explicitly reserve it: kubernetes/community#2379

Copy link
Copy Markdown
Contributor Author

@mtaufen mtaufen Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't the other ones in core/v1 in addition to core? Just because they're in meta/v1 instead (odd, as you said)?
Should we put a constant for the node lease namespace in both core and core/v1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to both for now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we're not using the injected clock here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, good catch!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 seconds hard-coded? this looks weird...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it a constant. I'm just keeping it consistent with what node registration does, for lack of a concrete principle to actually base the number on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these need to match the node registration exactly, but if you want to, factoring them out to use in both places sounds good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep them decoupled, but just set to the same number for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this working today in a test cluster with authz enabled? I expected to see modifications to the Node authorizer and NodeRestriction admission plugin to let kubelets mess with their own leases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I forgot about that. I will add those changes to this PR, and it probably makes sense to ensure the e2e test also runs in the standard e2e suite, instead of just the e2e node suite.

We also don't currently run the node authorizer/node restriction in the e2e node tests. Maybe I should revisit #60172?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these client calls include a request timeout?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heartbeatClient is explicitly constructed with a timeout to avoid eternal hangs, and a distinct QPS to avoid starvation by other kube API calls, but is currently fixed to corev1. need to adapt/widen that for use here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that is configured on the client before it is passed to the controller.

Copy link
Copy Markdown
Contributor Author

@mtaufen mtaufen Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the timeout to match the heartbeat frequency, then we need a distinct client, since the Lease-based heartbeat potentially has a different frequency than the node status update, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the shorter of the two would probably be fine

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expecting to use the node name as the name of the lease object. Is that compatible with the currently allowed names of the lease objects? I can't remember if the initial PR limited them to single dns labels (e.g. no . characters). If so, we likely want to relax that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ValidateLease validates a Lease.
func ValidateLease(lease *coordination.Lease) field.ErrorList {
	allErrs := validation.ValidateObjectMeta(&lease.ObjectMeta, true, validation.NameIsDNSSubdomain, field.NewPath("objectMeta"))
	allErrs = append(allErrs, ValidateLeaseSpec(&lease.Spec, field.NewPath("spec"))...)
	return allErrs
}

Looks like we just require the name to be a DNS subdomain, via validation.NameIsDNSSubdomain.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 17, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 17, 2018
@mtaufen mtaufen force-pushed the node-lease branch 4 times, most recently from 59a0b33 to 4166d10 Compare July 17, 2018 23:44
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Aug 22, 2018
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Aug 23, 2018

/hold cancel
/retest

my comments are addressed, thanks

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2018
@yujuhong
Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2018
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mtaufen, wojtek-t, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2018
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 23, 2018
@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented Aug 23, 2018

re-add label after rebase

@mtaufen mtaufen added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2018
@wojtek-t
Copy link
Copy Markdown
Member

@mtaufen - please rebase one more time.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2018
This extends the Kubelet to create and periodically update leases in a
new kube-node-lease namespace. Based on [KEP-0009](https://github.com/kubernetes/community/blob/master/keps/sig-node/0009-node-heartbeat.md),
these leases can be used as a node health signal, and will allow us to
reduce the load caused by over-frequent node status reporting.

- add NodeLease feature gate
- add kube-node-lease system namespace for node leases
- add Kubelet option for lease duration
- add Kubelet-internal lease controller to create and update lease
- add e2e test for NodeLease feature
- modify node authorizer and node restriction admission controller
to allow Kubelets access to corresponding leases
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 26, 2018
@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented Aug 26, 2018

re-add label after rebase

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2018
@mtaufen
Copy link
Copy Markdown
Contributor Author

mtaufen commented Aug 27, 2018

/retest

@k8s-github-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link
Copy Markdown

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@wojtek-t
Copy link
Copy Markdown
Member

Great to see that merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubeadm area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.