Initial node drain implementation for #3885.#16698
Initial node drain implementation for #3885.#16698k8s-github-robot merged 1 commit intokubernetes:masterfrom
Conversation
|
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
1 similar comment
|
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
|
Labelling this PR as size/L |
|
@kubernetes/kubectl |
|
Is there a difference between laming a node and draining a node? |
|
@mikedanese "laming" usually refers to marking it unschedulable, while "draining" refers to (gracefully) evicting whatever is running |
|
@madhusudancs you might be interested to take a look at this since we've talked about it |
pkg/kubectl/cmd/drain.go
Outdated
There was a problem hiding this comment.
Note here that they should use --force to delete them.
There was a problem hiding this comment.
Also move this before the warning so you won't need the else clause
if !options.Force {
return fmt.Errorf("Refusing to continue due to unreplicated pods: %s", joined)
}
fmt.Fprintf(out, "WARNING: Deleting these unreplicated pods: %s\n", joined)|
I just have two minor comments. |
|
Please run |
pkg/kubectl/cmd/drain.go
Outdated
There was a problem hiding this comment.
It'd be great to have more details on "what does drain nodes mean and what happens after I drain my nodes."
|
What's the use case of |
|
@k8s-bot ok to test |
|
GCE e2e build/test failed for commit 617c41b2f4a8e5026478dc0b780e62ad80f23d6d. |
|
Sigh. This passes on my workstation: % hack/verify-generated-docs.sh
+++ [1104 15:28:27] Building go targets for linux/amd64:
cmd/gendocs
cmd/genkubedocs
cmd/genman
cmd/genbashcomp
cmd/mungedocs
+++ [1104 15:28:28] Placing binaries
Generated docs are up to date.
But fails on jenkins. 14:28:48 Verifying ./hack/../hack/verify-generated-docs.sh
14:28:48 +++ [1104 22:28:48] Building go targets for linux/amd64:
14:28:48 cmd/gendocs
14:28:48 cmd/genkubedocs
14:28:48 cmd/genman
14:28:48 cmd/genbashcomp
14:28:48 cmd/mungedocs
14:29:10 +++ [1104 22:29:10] Placing binaries
14:29:13 /workspace/kubernetes/docs/user-guide/kubectl/kubectl.md
14:29:13 ----
14:29:13 md-links:
14:29:13 contents were modified
14:29:13
14:29:13 FAIL: changes needed but not made due to --verify
14:29:13 /workspace/kubernetes/docs/ is out of date. Please run hack/update-generated-docs.sh
14:29:13 !!! Error in ./hack/../hack/verify-generated-docs.sh:28
14:29:13 '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
14:29:13 Call stack:
14:29:13 1: ./hack/../hack/verify-generated-docs.sh:28 main(...)
14:29:13 Exiting with status 1
14:29:13 FAILED
|
|
With great effort, I was able to hack mungedocs up to produce a file I could turn into a diff. It should just produce a diff, and I'll file a bug for that if there's not one already. Anyway, --- /tmp/zshs0QcYE 2015-11-04 16:45:46.645617509 -0800 +++ /tmp/zsh2rPbMb 2015-11-04 16:45:46.657617610 -0800 @@ -64,7 +64,7 @@ ### Addon Resources -To prevent memory leaks or other resource issues in from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)). +To prevent memory leaks or other resource issues in [cluster addons](../../cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)). For example: +++ is what's on the filesystem. It's obviously correct. |
|
GCE e2e test build/test passed for commit ede367bc8925bdc3c1b506bc0a4555d4cc9cb031. |
|
Recording this before I forget it all. Now running this in docker, I get % build/run.sh hack/verify-generated-docs.sh
+++ [1104 18:10:55] Verifying Prerequisites....
+++ [1104 18:10:56] Building Docker image kube-build:cross.
+++ [1104 18:10:58] Building Docker image kube-build:build-0343d9fca5.
+++ [1104 18:11:02] Running build command....
+++ [1105 02:11:03] Building go targets for linux/amd64:
cmd/gendocs
cmd/genkubedocs
cmd/genman
cmd/genbashcomp
cmd/mungedocs
+++ [1105 02:11:03] Placing binaries
/go/src/k8s.io/kubernetes/hack/lib/util.sh: line 186: /go/src/k8s.io/kubernetes/.generated_docs: No such file or directory
!!! Error in /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212
'popd > /dev/null' exited with status 1
Call stack:
1: /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212 main(...)
Exiting with status 1
!!! Error in hack/verify-generated-docs.sh:28
'"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
Call stack:
1: hack/verify-generated-docs.sh:28 main(...)
Exiting with status 1
!!! Error in build/../build/common.sh:558
'"${docker_cmd[@]}" "$@"' exited with status 1
Call stack:
1: build/../build/common.sh:558 kube::build::run_build_command(...)
2: build/run.sh:30 main(...)
Exiting with status 1
This is total nonsense.
I'm not convinced I understand how this test is supposed to work inside docker. Tag @jbeda because I notice your recent edits and @ixdy because tests. |
|
GCE e2e test build/test passed for commit c6e9ad0. |
|
Sigh. Now a new test is failing in Jenkins. |
|
This fails in the same way on my workstation, both on this branch and on master. ¯_(ツ)_/¯ |
|
I've run this repeatedly, and all I can tell so far is that it's flaky. The specific flakiness is that sometimes (more often than not) after we've run cluster/update-storage-objects.sh the API server will never become healthy. The timeout is 2m, and even with --v=10, it's clear that the apiserver is not doing anything. The process is running. If after the test fails (I've commented out the cleanup code that kills etcd and apiserver), I kill it and manually start it, it seems to come up every time. |
|
@k8s-bot test this [I'm feeling lucky] |
|
GCE e2e test build/test passed for commit c6e9ad0. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test failed for commit c6e9ad0. |
|
@k8s-bot test this please |
|
GCE e2e test build/test passed for commit c6e9ad0. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e test build/test passed for commit c6e9ad0. |
|
Automatic merge from submit-queue |
Auto commit by PR queue bot
The property was added in c6e9ad0 (Initial node drain implementation for kubernetes#3885, 2015-10-30, kubernetes#16698), but beb5ea6 (remove mapper dependency - PrintSuccess, 2018-02-01, kubernetes#59227) removed the only initializer.
The property was added in c6e9ad0 (Initial node drain implementation for kubernetes#3885, 2015-10-30, kubernetes#16698), but beb5ea6 (remove mapper dependency - PrintSuccess, 2018-02-01, kubernetes#59227) removed the only initializer.
When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial node drain implementation for kubernetes#3885, 2015-08-30, kubernetes#16698), the drain logic looked in a created-by annotation for recognized kinds [1], so listing the set of recognized kinds was a clear approach. Sometime later, the source moved into ownerReferences, but the hard-coded set of recognized controller kinds remained. When kubernetes/kubernetes@2f1108451f (Remove hard-coded pod-controller check, 2017-12-05, kubernetes#56864) removed the hard-coded set of recognized controller kinds, it should have also updated these messages to remove stale references to the previous hard-coded values. This commit catches the message strings up with that commit. [1]: kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216
When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial node drain implementation for #3885, 2015-08-30, kubernetes/kubernetes#16698), the drain logic looked in a created-by annotation for recognized kinds [1], so listing the set of recognized kinds was a clear approach. Sometime later, the source moved into ownerReferences, but the hard-coded set of recognized controller kinds remained. When kubernetes/kubernetes@2f1108451f (Remove hard-coded pod-controller check, 2017-12-05, kubernetes/kubernetes#56864) removed the hard-coded set of recognized controller kinds, it should have also updated these messages to remove stale references to the previous hard-coded values. This commit catches the message strings up with that commit. [1]: kubernetes/kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216 Kubernetes-commit: 587f4f04cc5fc18f4e85ed6a4a06bbf1bfee0496
It lames (marks unschedulable) the given node, and then deletes every
pod on it, optionally using a grace period. It will not delete
unreplicated pods without the use of --force.
Also add undrain, which just marks nodes as schedulable. #3885