fix(entrypoint): skip drain/uncordon on agent nodes#1648
Merged
iwilltry42 merged 1 commit intok3d-io:mainfrom Mar 24, 2026
Merged
fix(entrypoint): skip drain/uncordon on agent nodes#1648iwilltry42 merged 1 commit intok3d-io:mainfrom
iwilltry42 merged 1 commit intok3d-io:mainfrom
Conversation
Agent nodes don't have a kubeconfig at the default path, so the kubectl uncordon/drain calls added in k3d-io#1119 fail in an infinite retry loop, spamming logs with localhost:8080 connection refused errors. Gate drain/uncordon on K3S_ROLE=server so agents get clean SIGTERM forwarding only. Match server explicitly rather than excluding agent so unknown values fall through to the safe default. Also captures into K3S_ROLE before defining cleanup(), since inside a function refers to the function's args, not the script's (would crash under set -o nounset on shutdown). Fixes k3d-io#1420, k3d-io#1535
There was a problem hiding this comment.
Pull request overview
Updates the embedded k3d-entrypoint.sh to avoid running kubectl uncordon/kubectl drain on k3s agent nodes, preventing infinite kubectl retry spam when agents lack a usable kubeconfig.
Changes:
- Capture the initial k3s subcommand (
$1) into a script-scopeK3S_ROLEto keep trap/cleanup logic working underset -o nounset. - Gate
kubectl uncordon(startup) andkubectl drain(shutdown) so they only run whenK3S_ROLE="server". - Keep agent shutdown behavior to SIGTERM forwarding + wait only.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Author
|
@iwilltry42 anything in particular I can do to help triage this one? I appreciate that you're likely to be plenty busy on other projects already. |
iwilltry42
approved these changes
Feb 16, 2026
|
👀 |
Member
|
Finally merged. Sorry for the long wait and thanks for your contribution @dpritchett ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Skip
kubectl uncordonandkubectl drainon agent nodes ink3d-entrypoint.sh. Only server nodes run drain/uncordon; agents get clean SIGTERM forwarding only.Also captures
$1into aK3S_ROLEvariable at script scope, since$1inside a shell function refers to the function's arguments, not the script's. Without this,set -o nounsetwould crash when the trap fires on shutdown.Why
PR #1119 added graceful drain/uncordon to the entrypoint, but it runs unconditionally on all node types. Agent nodes don't have a kubeconfig at the default path (
/etc/rancher/k3s/k3s.yaml), sokubectlfalls back tolocalhost:8080and retries forever, spamming agent logs from the moment the node starts.Fixes #1420, #1535
May also help with #1526, #1452 (multi-server restart hangs)
Implications
This changes behavior for agent nodes only. Server nodes are unaffected and still drain on shutdown and uncordon on start, exactly as before.
The change is in
pkg/types/fixes/assets/k3d-entrypoint.sh(embedded shell script). No Go code changes. No CLI changes.We match
$1 = "server"explicitly rather than excluding"agent", so any unexpected value (e.g. someone running the container image directly with arbitrary args) falls through to the safe default of SIGTERM forwarding only.Testing
Tested locally against
k3s v1.34.3+k3s3with a patched binary (make buildre-embeds the script via//go:embed).$1 validation: Confirmed via
docker inspect --format '{{json .Config.Cmd}}'that server containers receive["server", ...]and agent containers receive["agent"].1 server + 1 agent cluster:
localhost:8080errors (was infinite loop before this fix)Graceful shutdown (cluster stop):
Draining node...(drain ran as expected)Sending SIGTERM to k3s.../Waiting for k3s to close.../Bye!(no drain, clean exit)3 servers + 2 agents:
localhost:8080errors