Delete node lease if the corresponding node is deleted#70034
Delete node lease if the corresponding node is deleted#70034k8s-ci-robot merged 1 commit intokubernetes:masterfrom
Conversation
|
@wangzhen127: You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to set the milestone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
5823290 to
868f884
Compare
There was a problem hiding this comment.
This has a problem that if this fail (or if controller-manager will be killed/stopped in the meantime or sth like that), the lease object will be left.
Wouldn't it be possible to set ownerReference of the Lease object to be the corresponding Node object and rely on GC? @janetkuo - thought?
There was a problem hiding this comment.
This has a problem that if this fail (or if controller-manager will be killed/stopped in the meantime or sth like that), the lease object will be left.
The alternative is to compare the existing nodes and leases and do the cleanup every time when monitorNodeHealth() runs.
Wouldn't it be possible to set ownerReference of the Lease object to be the corresponding Node object and rely on GC? Janet Kuo - thought?
IIUC, OwnerReference approach is not feasible because an owning object must be in the same namespace as the owned object, and node does not have namespace. See here. I also included this in the PR description.
There was a problem hiding this comment.
The alternative is to compare the existing nodes and leases and do the cleanup every time when monitorNodeHealth() runs.
@wangzhen127 do the leases have information about which node acquired it ?
IIUC, OwnerReference approach is not feasible because an owning object must be in the same namespace as the owned object, and node does not have namespace. See here. I also included this in the PR description.
@janetkuo @caesarxuchao Does the GC work on non namespaced objects ?
There was a problem hiding this comment.
I really think that we need something that is more robust that this.
Listing all leases (from local cache should be good enough) and comparing them with existing nodes sounds reasonable.
[If we won't be able to use GC - I pinged Janet and Chao about that offline].
There was a problem hiding this comment.
GC support owner being non namespaced, while the dependents are namespaced. (see #65200).
There was a problem hiding this comment.
@wangzhen127 - if that mechanism works, then we should use it.
There was a problem hiding this comment.
SG. Will try owner reference and update later.
There was a problem hiding this comment.
is this the place you verify the action of delete was called for resource leases ?
There was a problem hiding this comment.
We should also verify the name of the lease being deleted and if that was indeed held by the node that got deleted
868f884 to
9fb1285
Compare
9fb1285 to
cd09483
Compare
|
This LGTM. |
|
/retest |
|
Hi @wangzhen127 - I'm Enhancements Shadow for 1.13. This looks like it is nearing completion! Just a friendly reminder that code slush begins on 11/9 and code freeze is 11/15, so if you could follow up with your reviewers and get this in, that would be awesome. Thank you! |
|
Ping @yujuhong :) |
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
Should the test be skipped if there are more than one MIG?
There was a problem hiding this comment.
I am not entirely sure about the reasoning here. I copied from resize_nodes tests:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/lifecycle/resize_nodes.go#L57
test/e2e/lifecycle/node_lease.go
Outdated
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
Include the deleted node name in the error message for debugging.
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
Do we really need to wait 5 minutes for this to be true? Seems excessive to me.
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
Can we shorten the timeout here?
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
What's the expected latency for the lease to be deleted?
There was a problem hiding this comment.
That depends on GC. Based on my observation previously, it is deleted very soon after deleting nodes. Changing the timeout to 1m now.
pkg/kubelet/nodelease/controller.go
Outdated
There was a problem hiding this comment.
Why is this Infof as opposed to Errorf?
cd09483 to
691f95c
Compare
pkg/kubelet/nodelease/controller.go
Outdated
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
I am not entirely sure about the reasoning here. I copied from resize_nodes tests:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/lifecycle/resize_nodes.go#L57
test/e2e/lifecycle/node_lease.go
Outdated
test/e2e/lifecycle/node_lease.go
Outdated
test/e2e/lifecycle/node_lease.go
Outdated
There was a problem hiding this comment.
That depends on GC. Based on my observation previously, it is deleted very soon after deleting nodes. Changing the timeout to 1m now.
test/e2e/lifecycle/node_lease.go
Outdated
test/e2e/lifecycle/node_lease.go
Outdated
|
The failures in pull-kubernetes-e2e-gce-alpha-features seem not related to this PR, because the gci-gce-alpha-features CI jobs are also failing. Node lease tests are passing. |
|
/lgtm |
|
/approve |
1 similar comment
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wangzhen127, wojtek-t, yujuhong The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
Do we have to wait until pull-kubernetes-e2e-gce-alpha-features fully pass before merging, which is not a required job? The failure is on CSI tests, which are unrelated to this PR. The CSI failures are tracked in #70760 Since the node lease tests are passing, can we merge ignoring pull-kubernetes-e2e-gce-alpha-features? |
|
/retest Review the full test history for this PR. Silence the bot with an |
|
/priority important-soon |
Yes looks like these test are bad. They should not block 1.13. @mkimuram please open a PR to disable the tests and then fix them before reenabling. |
|
/override pull-kubernetes-e2e-gce-alpha-features |
|
gah... I think we need to wait for the test to finish before we can override the context |
|
/override pull-kubernetes-e2e-gce-alpha-features |
1 similar comment
|
/override pull-kubernetes-e2e-gce-alpha-features |
|
/skip |
|
/test pull-kubernetes-e2e-gce-100-performance |
1 similar comment
|
/test pull-kubernetes-e2e-gce-100-performance |
|
@wangzhen127: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR ensures that node lease is deleted if the corresponding node is deleted. This is done in node lifecycle controller. OwnerReference approach is not feasible because an owning object must be in the same namespace as the owned object, and node does not have namespace.
This is part of KEP-0009, feature #589 and issue #14733.
Does this PR introduce a user-facing change?:
Yes. When node lease feature is enabled, with this PR, user will see node lease is deleted when the corresponding node is deleted.
Release note:
/sig node
/milestone v1.13