Skip to content

Initial node drain implementation for #3885.#16698

Merged
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mml:issue3885
Jan 9, 2016
Merged

Initial node drain implementation for #3885.#16698
k8s-github-robot merged 1 commit intokubernetes:masterfrom
mml:issue3885

Conversation

@mml
Copy link
Copy Markdown
Contributor

@mml mml commented Nov 3, 2015

It lames (marks unschedulable) the given node, and then deletes every
pod on it, optionally using a grace period. It will not delete
unreplicated pods without the use of --force.

Also add undrain, which just marks nodes as schedulable. #3885

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Nov 3, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

1 similar comment
@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Nov 3, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

@k8s-github-robot
Copy link
Copy Markdown

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2015
@mikedanese mikedanese assigned davidopp and unassigned eparis Nov 3, 2015
@mikedanese
Copy link
Copy Markdown
Member

@kubernetes/kubectl

@mikedanese
Copy link
Copy Markdown
Member

Is there a difference between laming a node and draining a node?

@davidopp
Copy link
Copy Markdown
Contributor

davidopp commented Nov 3, 2015

@mikedanese "laming" usually refers to marking it unschedulable, while "draining" refers to (gracefully) evicting whatever is running

@davidopp
Copy link
Copy Markdown
Contributor

davidopp commented Nov 3, 2015

@madhusudancs you might be interested to take a look at this since we've talked about it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here that they should use --force to delete them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also move this before the warning so you won't need the else clause

if !options.Force {
      return fmt.Errorf("Refusing to continue due to unreplicated pods: %s", joined)
}
fmt.Fprintf(out, "WARNING: Deleting these unreplicated pods: %s\n", joined)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@madhusudancs
Copy link
Copy Markdown
Contributor

I just have two minor comments.

@janetkuo
Copy link
Copy Markdown
Member

janetkuo commented Nov 4, 2015

Please run hack/update-generated-docs.sh to generate md files for both commands.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have more details on "what does drain nodes mean and what happens after I drain my nodes."

@janetkuo
Copy link
Copy Markdown
Member

janetkuo commented Nov 4, 2015

What's the use case of kubectl undrain other than undoing previously performed kubectl drain? IMO we can have kubectl drain --undo instead, or we should mention kubectl drain in kubectl undrain document. cc @bgrant0607

@janetkuo
Copy link
Copy Markdown
Member

janetkuo commented Nov 4, 2015

@k8s-bot ok to test

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Nov 4, 2015

GCE e2e build/test failed for commit 617c41b2f4a8e5026478dc0b780e62ad80f23d6d.

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Nov 4, 2015

Sigh. This passes on my workstation:

% hack/verify-generated-docs.sh
+++ [1104 15:28:27] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genbashcomp
    cmd/mungedocs
+++ [1104 15:28:28] Placing binaries
Generated docs are up to date.

But fails on jenkins.

14:28:48 Verifying ./hack/../hack/verify-generated-docs.sh
14:28:48 +++ [1104 22:28:48] Building go targets for linux/amd64:
14:28:48     cmd/gendocs
14:28:48     cmd/genkubedocs
14:28:48     cmd/genman
14:28:48     cmd/genbashcomp
14:28:48     cmd/mungedocs
14:29:10 +++ [1104 22:29:10] Placing binaries
14:29:13 /workspace/kubernetes/docs/user-guide/kubectl/kubectl.md
14:29:13 ----
14:29:13 md-links:
14:29:13 contents were modified
14:29:13 
14:29:13 FAIL: changes needed but not made due to --verify
14:29:13 /workspace/kubernetes/docs/ is out of date. Please run hack/update-generated-docs.sh
14:29:13 !!! Error in ./hack/../hack/verify-generated-docs.sh:28
14:29:13   '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
14:29:13 Call stack:
14:29:13   1: ./hack/../hack/verify-generated-docs.sh:28 main(...)
14:29:13 Exiting with status 1
14:29:13 FAILED

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Nov 5, 2015

With great effort, I was able to hack mungedocs up to produce a file I could turn into a diff. It should just produce a diff, and I'll file a bug for that if there's not one already. Anyway,

--- /tmp/zshs0QcYE      2015-11-04 16:45:46.645617509 -0800
+++ /tmp/zsh2rPbMb      2015-11-04 16:45:46.657617610 -0800
@@ -64,7 +64,7 @@
 
 ### Addon Resources
 
-To prevent memory leaks or other resource issues in  from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
+To prevent memory leaks or other resource issues in [cluster addons](../../cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
 
 For example:
 

+++ is what's on the filesystem. It's obviously correct.
--- is what mungedocs wants to see when it runs inside docker. I think this is because cluster is missing from the list of inputs when we create the docker container.

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Nov 5, 2015

GCE e2e test build/test passed for commit ede367bc8925bdc3c1b506bc0a4555d4cc9cb031.

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Nov 5, 2015

Recording this before I forget it all. Now running this in docker, I get

% build/run.sh hack/verify-generated-docs.sh 
+++ [1104 18:10:55] Verifying Prerequisites....
+++ [1104 18:10:56] Building Docker image kube-build:cross.
+++ [1104 18:10:58] Building Docker image kube-build:build-0343d9fca5.
+++ [1104 18:11:02] Running build command....
+++ [1105 02:11:03] Building go targets for linux/amd64:
    cmd/gendocs
    cmd/genkubedocs
    cmd/genman
    cmd/genbashcomp
    cmd/mungedocs
+++ [1105 02:11:03] Placing binaries
/go/src/k8s.io/kubernetes/hack/lib/util.sh: line 186: /go/src/k8s.io/kubernetes/.generated_docs: No such file or directory
!!! Error in /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212
  'popd > /dev/null' exited with status 1
Call stack:
  1: /go/src/k8s.io/kubernetes/hack/after-build/verify-generated-docs.sh:212 main(...)
Exiting with status 1
!!! Error in hack/verify-generated-docs.sh:28
  '"${KUBE_ROOT}/hack/after-build/verify-generated-docs.sh" "$@"' exited with status 1
Call stack:
  1: hack/verify-generated-docs.sh:28 main(...)
Exiting with status 1
!!! Error in build/../build/common.sh:558
  '"${docker_cmd[@]}" "$@"' exited with status 1
Call stack:
  1: build/../build/common.sh:558 kube::build::run_build_command(...)
  2: build/run.sh:30 main(...)
Exiting with status 1

This is total nonsense.

  • hack/after-build/verify-generated-docs.sh has no line 212
  • I'm pretty sure popd > /dev/null didn't exit with status 1. The shell function kube::util::gen-docs exited nonzero because its last line was a while read file; do...done <"${KUBE_ROOT}/.generated_docs" and that file does not exist.
  • In that sense, the first line is correct, but it confusingly refers to the start of kube::util::gen-docs.

I'm not convinced I understand how this test is supposed to work inside docker. Tag @jbeda because I notice your recent edits and @ixdy because tests.

@bgrant0607
Copy link
Copy Markdown
Member

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Jan 8, 2016

GCE e2e test build/test passed for commit c6e9ad0.

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Jan 8, 2016

Sigh. Now a new test is failing in Jenkins.

!!! [0108 21:12:42] Timed out waiting for apiserver:  to answer at http://127.0.0.1:8080/healthz; tried 120 waiting 1 between each
!!! Error in ./hack/test-update-storage-objects.sh:48
  'return 1' exited with status 1
Call stack:
  1: ./hack/test-update-storage-objects.sh:48 startApiServer(...)
  2: ./hack/test-update-storage-objects.sh:123 main(...)
Exiting with status 1
+++ [0108 21:12:42] Killing api server
+++ [0108 21:12:42] Clean up complete

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Jan 8, 2016

This fails in the same way on my workstation, both on this branch and on master. ¯_(ツ)_/¯

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Jan 9, 2016

I've run this repeatedly, and all I can tell so far is that it's flaky. The specific flakiness is that sometimes (more often than not) after we've run cluster/update-storage-objects.sh the API server will never become healthy. The timeout is 2m, and even with --v=10, it's clear that the apiserver is not doing anything. The process is running.

If after the test fails (I've commented out the cleanup code that kills etcd and apiserver), I kill it and manually start it, it seems to come up every time.

@mml
Copy link
Copy Markdown
Contributor Author

mml commented Jan 9, 2016

@k8s-bot test this [I'm feeling lucky]

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Jan 9, 2016

GCE e2e test build/test passed for commit c6e9ad0.

@k8s-github-robot
Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Jan 9, 2016

GCE e2e build/test failed for commit c6e9ad0.

@davidopp
Copy link
Copy Markdown
Contributor

davidopp commented Jan 9, 2016

@k8s-bot test this please

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Jan 9, 2016

GCE e2e test build/test passed for commit c6e9ad0.

@k8s-github-robot
Copy link
Copy Markdown

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link
Copy Markdown

k8s-bot commented Jan 9, 2016

GCE e2e test build/test passed for commit c6e9ad0.

@k8s-github-robot
Copy link
Copy Markdown

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Jan 9, 2016
Auto commit by PR queue bot
@k8s-github-robot k8s-github-robot merged commit ce9b117 into kubernetes:master Jan 9, 2016
@mml mml deleted the issue3885 branch January 11, 2016 23:27
@davidopp davidopp mentioned this pull request May 14, 2016
7 tasks
wking added a commit to wking/kubernetes that referenced this pull request Jul 17, 2018
The property was added in c6e9ad0 (Initial node drain implementation
for kubernetes#3885, 2015-10-30, kubernetes#16698), but beb5ea6 (remove mapper dependency
- PrintSuccess, 2018-02-01, kubernetes#59227) removed the only initializer.
vithati pushed a commit to vithati/kubernetes that referenced this pull request Oct 25, 2018
The property was added in c6e9ad0 (Initial node drain implementation
for kubernetes#3885, 2015-10-30, kubernetes#16698), but beb5ea6 (remove mapper dependency
- PrintSuccess, 2018-02-01, kubernetes#59227) removed the only initializer.
wking added a commit to wking/kubernetes that referenced this pull request Feb 24, 2022
When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial
node drain implementation for kubernetes#3885, 2015-08-30,
kubernetes#16698), the drain logic looked in a created-by
annotation for recognized kinds [1], so listing the set of recognized
kinds was a clear approach.

Sometime later, the source moved into ownerReferences, but the
hard-coded set of recognized controller kinds remained.

When kubernetes/kubernetes@2f1108451f (Remove hard-coded
pod-controller check, 2017-12-05, kubernetes#56864) removed
the hard-coded set of recognized controller kinds, it should have also
updated these messages to remove stale references to the previous
hard-coded values.  This commit catches the message strings up with
that commit.

[1]: kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216
k8s-publishing-bot pushed a commit to kubernetes/kubectl that referenced this pull request Mar 5, 2022
When it initially landed in kubernetes/kubernetes@c6e9ad066e (Initial
node drain implementation for #3885, 2015-08-30,
kubernetes/kubernetes#16698), the drain logic looked in a created-by
annotation for recognized kinds [1], so listing the set of recognized
kinds was a clear approach.

Sometime later, the source moved into ownerReferences, but the
hard-coded set of recognized controller kinds remained.

When kubernetes/kubernetes@2f1108451f (Remove hard-coded
pod-controller check, 2017-12-05, kubernetes/kubernetes#56864) removed
the hard-coded set of recognized controller kinds, it should have also
updated these messages to remove stale references to the previous
hard-coded values.  This commit catches the message strings up with
that commit.

[1]: kubernetes/kubernetes@c6e9ad0#diff-211259b8a8ec42f105264c10897dad48029badb538684e60e43eaead68c3d219R216

Kubernetes-commit: 587f4f04cc5fc18f4e85ed6a4a06bbf1bfee0496
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubectl lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.