Initial node drain implementation for #3885. by mml · Pull Request #16698 · kubernetes/kubernetes

mml · 2015-11-03T00:11:03Z

It lames (marks unschedulable) the given node, and then deletes every
pod on it, optionally using a grace period. It will not delete
unreplicated pods without the use of --force.

Also add undrain, which just marks nodes as schedulable. #3885

k8s-bot · 2015-11-03T00:12:07Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

k8s-bot · 2015-11-03T00:13:09Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

k8s-github-robot · 2015-11-03T00:17:27Z

Labelling this PR as size/L

mikedanese · 2015-11-03T05:35:24Z

@kubernetes/kubectl

mikedanese · 2015-11-03T05:37:00Z

Is there a difference between laming a node and draining a node?

davidopp · 2015-11-03T07:12:40Z

@mikedanese "laming" usually refers to marking it unschedulable, while "draining" refers to (gracefully) evicting whatever is running

davidopp · 2015-11-03T07:36:42Z

@madhusudancs you might be interested to take a look at this since we've talked about it

0xmichalis · 2015-11-03T10:42:04Z

pkg/kubectl/cmd/drain.go

Note here that they should use --force to delete them.

Also move this before the warning so you won't need the else clause

if !options.Force { return fmt.Errorf("Refusing to continue due to unreplicated pods: %s", joined) } fmt.Fprintf(out, "WARNING: Deleting these unreplicated pods: %s\n", joined)

madhusudancs · 2015-11-03T21:13:31Z

I just have two minor comments.

janetkuo · 2015-11-04T00:30:54Z

Please run hack/update-generated-docs.sh to generate md files for both commands.

janetkuo · 2015-11-04T00:45:43Z

pkg/kubectl/cmd/drain.go

It'd be great to have more details on "what does drain nodes mean and what happens after I drain my nodes."

janetkuo · 2015-11-04T00:47:59Z

What's the use case of kubectl undrain other than undoing previously performed kubectl drain? IMO we can have kubectl drain --undo instead, or we should mention kubectl drain in kubectl undrain document. cc @bgrant0607

janetkuo · 2015-11-04T01:05:14Z

@k8s-bot ok to test

k8s-bot · 2015-11-04T22:41:44Z

GCE e2e build/test failed for commit 617c41b2f4a8e5026478dc0b780e62ad80f23d6d.

mml · 2015-11-04T23:29:07Z

Sigh. This passes on my workstation:

% hack/verify-generated-docs.sh
+++ [1104 15:28:27] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genbashcomp
    cmd/mungedocs
+++ [1104 15:28:28] Placing binaries
Generated docs are up to date.

But fails on jenkins.

14:28:48 Verifying ./hack/../hack/verify-generated-docs.sh
14:28:48 +++ [1104 22:28:48] Building go targets for linux/amd64:
14:28:48     cmd/gendocs
14:28:48     cmd/genkubedocs
14:28:48     cmd/genman
14:28:48     cmd/genbashcomp
14:28:48     cmd/mungedocs
14:29:10 +++ [1104 22:29:10] Placing binaries
14:29:13 /workspace/kubernetes/docs/user-guide/kubectl/kubectl.md
14:29:13 ----
14:29:13 md-links:
14:29:13 contents were modified
14:29:13 
14:29:13 FAIL: changes needed but not made due to --verify
14:29:13 /workspace/kubernetes/docs/ is out of date. Please run hack/update-generated-docs.sh
14:29:13 !!! Error in ./hack/../hack/verify-generated-docs.sh:28
14:29:13   '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
14:29:13 Call stack:
14:29:13   1: ./hack/../hack/verify-generated-docs.sh:28 main(...)
14:29:13 Exiting with status 1
14:29:13 FAILED

mml · 2015-11-05T00:49:11Z

With great effort, I was able to hack mungedocs up to produce a file I could turn into a diff. It should just produce a diff, and I'll file a bug for that if there's not one already. Anyway,

--- /tmp/zshs0QcYE      2015-11-04 16:45:46.645617509 -0800
+++ /tmp/zsh2rPbMb      2015-11-04 16:45:46.657617610 -0800
@@ -64,7 +64,7 @@
 
 ### Addon Resources
 
-To prevent memory leaks or other resource issues in  from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
+To prevent memory leaks or other resource issues in [cluster addons](../../cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
 
 For example:

+++ is what's on the filesystem. It's obviously correct.
--- is what mungedocs wants to see when it runs inside docker. I think this is because cluster is missing from the list of inputs when we create the docker container.

k8s-bot · 2015-11-05T01:55:35Z

GCE e2e test build/test passed for commit ede367bc8925bdc3c1b506bc0a4555d4cc9cb031.

mml · 2015-11-05T02:20:39Z

Recording this before I forget it all. Now running this in docker, I get

% build/run.sh hack/verify-generated-docs.sh 
+++ [1104 18:10:55] Verifying Prerequisites....
+++ [1104 18:10:56] Building Docker image kube-build:cross.
+++ [1104 18:10:58] Building Docker image kube-build:build-0343d9fca5.
+++ [1104 18:11:02] Running build command....
+++ [1105 02:11:03] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genbashcomp
    cmd/mungedocs
+++ [1105 02:11:03] Placing binaries
/go/src/k8s.io/kubernetes/hack/lib/util.sh: line 186: /go/src/k8s.io/kubernetes/.generated_docs: No such file or directory
!!! Error in /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212
  'popd > /dev/null' exited with status 1
Call stack:
  1: /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212 main(...)
Exiting with status 1
!!! Error in hack/verify-generated-docs.sh:28
  '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
Call stack:
  1: hack/verify-generated-docs.sh:28 main(...)
Exiting with status 1
!!! Error in build/../build/common.sh:558
  '"${docker_cmd[@]}" "$@"' exited with status 1
Call stack:
  1: build/../build/common.sh:558 kube::build::run_build_command(...)
  2: build/run.sh:30 main(...)
Exiting with status 1

This is total nonsense.

hack/after-build/verify-generated-docs.sh has no line 212
I'm pretty sure popd > /dev/null didn't exit with status 1. The shell function kube::util::gen-docs exited nonzero because its last line was a while read file; do...done <"${KUBE_ROOT}/.generated_docs" and that file does not exist.
In that sense, the first line is correct, but it confusingly refers to the start of kube::util::gen-docs.

I'm not convinced I understand how this test is supposed to work inside docker. Tag @jbeda because I notice your recent edits and @ixdy because tests.

bgrant0607 · 2015-11-05T05:08:43Z

FYI: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/kubectl-conventions.md

k8s-bot · 2016-01-08T21:15:54Z

GCE e2e test build/test passed for commit c6e9ad0.

mml · 2016-01-08T21:40:25Z

Sigh. Now a new test is failing in Jenkins.

!!! [0108 21:12:42] Timed out waiting for apiserver:  to answer at http://127.0.0.1:8080/healthz; tried 120 waiting 1 between each
!!! Error in ./hack/test-update-storage-objects.sh:48
  'return 1' exited with status 1
Call stack:
  1: ./hack/test-update-storage-objects.sh:48 startApiServer(...)
  2: ./hack/test-update-storage-objects.sh:123 main(...)
Exiting with status 1
+++ [0108 21:12:42] Killing api server
+++ [0108 21:12:42] Clean up complete

mml · 2016-01-08T21:58:30Z

This fails in the same way on my workstation, both on this branch and on master. ¯_(ツ)_/¯

mml · 2016-01-09T00:33:38Z

I've run this repeatedly, and all I can tell so far is that it's flaky. The specific flakiness is that sometimes (more often than not) after we've run cluster/update-storage-objects.sh the API server will never become healthy. The timeout is 2m, and even with --v=10, it's clear that the apiserver is not doing anything. The process is running.

If after the test fails (I've commented out the cleanup code that kills etcd and apiserver), I kill it and manually start it, it seems to come up every time.

mml · 2016-01-09T00:34:27Z

@k8s-bot test this [I'm feeling lucky]

k8s-bot · 2016-01-09T01:07:58Z

GCE e2e test build/test passed for commit c6e9ad0.

k8s-github-robot · 2016-01-09T07:53:28Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-01-09T08:24:41Z

GCE e2e build/test failed for commit c6e9ad0.

davidopp · 2016-01-09T20:05:34Z

@k8s-bot test this please

k8s-bot · 2016-01-09T20:38:33Z

GCE e2e test build/test passed for commit c6e9ad0.

k8s-github-robot · 2016-01-09T21:08:02Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-01-09T21:38:28Z

GCE e2e test build/test passed for commit c6e9ad0.

k8s-github-robot · 2016-01-09T21:38:40Z

Automatic merge from submit-queue

Auto commit by PR queue bot

The property was added in c6e9ad0 (Initial node drain implementation for kubernetes#3885, 2015-10-30, kubernetes#16698), but beb5ea6 (remove mapper dependency - PrintSuccess, 2018-02-01, kubernetes#59227) removed the only initializer.

When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial node drain implementation for kubernetes#3885, 2015-08-30, kubernetes#16698), the drain logic looked in a created-by annotation for recognized kinds [1], so listing the set of recognized kinds was a clear approach. Sometime later, the source moved into ownerReferences, but the hard-coded set of recognized controller kinds remained. When kubernetes/kubernetes@2f1108451f (Remove hard-coded pod-controller check, 2017-12-05, kubernetes#56864) removed the hard-coded set of recognized controller kinds, it should have also updated these messages to remove stale references to the previous hard-coded values. This commit catches the message strings up with that commit. [1]: kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216

When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial node drain implementation for #3885, 2015-08-30, kubernetes/kubernetes#16698), the drain logic looked in a created-by annotation for recognized kinds [1], so listing the set of recognized kinds was a clear approach. Sometime later, the source moved into ownerReferences, but the hard-coded set of recognized controller kinds remained. When kubernetes/kubernetes@2f1108451f (Remove hard-coded pod-controller check, 2017-12-05, kubernetes/kubernetes#56864) removed the hard-coded set of recognized controller kinds, it should have also updated these messages to remove stale references to the previous hard-coded values. This commit catches the message strings up with that commit. [1]: kubernetes/kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216 Kubernetes-commit: 587f4f04cc5fc18f4e85ed6a4a06bbf1bfee0496

googlebot added the cla: yes label Nov 3, 2015

k8s-github-robot assigned eparis Nov 3, 2015

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2015

mikedanese assigned davidopp and unassigned eparis Nov 3, 2015

0xmichalis reviewed Nov 3, 2015
View reviewed changes

janetkuo reviewed Nov 4, 2015
View reviewed changes

mml mentioned this pull request Jan 9, 2016

test-update-storage-objects.sh is flaky #19435

Closed

k8s-github-robot pushed a commit that referenced this pull request Jan 9, 2016

Merge pull request #16698 from mml/issue3885

ce9b117

Auto commit by PR queue bot

k8s-github-robot merged commit ce9b117 into kubernetes:master Jan 9, 2016

mml deleted the issue3885 branch January 11, 2016 23:27

ixdy mentioned this pull request Jan 27, 2016

fully qualified cluster name should conform to kube-dns spec #19947

Merged

davidopp mentioned this pull request May 14, 2016

Implement taints and tolerations #24134

Merged

7 tasks

wking mentioned this pull request Jul 17, 2018

kubectl: Drop typer from DrainOptions #66301

Merged

wking mentioned this pull request Feb 24, 2022

kubectl drain: Generic "declare a controller" message #108314

Merged

Conversation

mml commented Nov 3, 2015

Uh oh!

k8s-bot commented Nov 3, 2015

Uh oh!

k8s-bot commented Nov 3, 2015

Uh oh!

k8s-github-robot commented Nov 3, 2015

Uh oh!

mikedanese commented Nov 3, 2015

Uh oh!

mikedanese commented Nov 3, 2015

Uh oh!

davidopp commented Nov 3, 2015

Uh oh!

davidopp commented Nov 3, 2015

Uh oh!

0xmichalis Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

0xmichalis Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

mml Nov 3, 2015

Choose a reason for hiding this comment

Uh oh!

madhusudancs commented Nov 3, 2015

Uh oh!

janetkuo commented Nov 4, 2015

Uh oh!

janetkuo Nov 4, 2015

Choose a reason for hiding this comment

Uh oh!

janetkuo commented Nov 4, 2015

Uh oh!

janetkuo commented Nov 4, 2015

Uh oh!

k8s-bot commented Nov 4, 2015

Uh oh!

mml commented Nov 4, 2015

Uh oh!

mml commented Nov 5, 2015

Uh oh!

k8s-bot commented Nov 5, 2015

Uh oh!

mml commented Nov 5, 2015

Uh oh!

bgrant0607 commented Nov 5, 2015

Uh oh!

k8s-bot commented Jan 8, 2016

Uh oh!

mml commented Jan 8, 2016

Uh oh!

mml commented Jan 8, 2016

Uh oh!

mml commented Jan 9, 2016

Uh oh!

mml commented Jan 9, 2016

Uh oh!

k8s-bot commented Jan 9, 2016

Uh oh!

k8s-github-robot commented Jan 9, 2016

Uh oh!

k8s-bot commented Jan 9, 2016

Uh oh!

davidopp commented Jan 9, 2016

Uh oh!

k8s-bot commented Jan 9, 2016

Uh oh!

k8s-github-robot commented Jan 9, 2016

Uh oh!

k8s-bot commented Jan 9, 2016

Uh oh!

k8s-github-robot commented Jan 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone