fix(entrypoint): skip drain/uncordon on agent nodes by dpritchett · Pull Request #1648 · k3d-io/k3d

dpritchett · 2026-02-11T21:11:13Z

What

Skip kubectl uncordon and kubectl drain on agent nodes in k3d-entrypoint.sh. Only server nodes run drain/uncordon; agents get clean SIGTERM forwarding only.

Also captures $1 into a K3S_ROLE variable at script scope, since $1 inside a shell function refers to the function's arguments, not the script's. Without this, set -o nounset would crash when the trap fires on shutdown.

Why

PR #1119 added graceful drain/uncordon to the entrypoint, but it runs unconditionally on all node types. Agent nodes don't have a kubeconfig at the default path (/etc/rancher/k3s/k3s.yaml), so kubectl falls back to localhost:8080 and retries forever, spamming agent logs from the moment the node starts.

Fixes #1420, #1535
May also help with #1526, #1452 (multi-server restart hangs)

Implications

This changes behavior for agent nodes only. Server nodes are unaffected and still drain on shutdown and uncordon on start, exactly as before.

The change is in pkg/types/fixes/assets/k3d-entrypoint.sh (embedded shell script). No Go code changes. No CLI changes.

We match $1 = "server" explicitly rather than excluding "agent", so any unexpected value (e.g. someone running the container image directly with arbitrary args) falls through to the safe default of SIGTERM forwarding only.

Testing

Tested locally against k3s v1.34.3+k3s3 with a patched binary (make build re-embeds the script via //go:embed).

$1 validation: Confirmed via docker inspect --format '{{json .Config.Cmd}}' that server containers receive ["server", ...] and agent containers receive ["agent"].

1 server + 1 agent cluster:

Agent logs: zero localhost:8080 errors (was infinite loop before this fix)
Both nodes Ready, cluster functional

Graceful shutdown (cluster stop):

Server logs: Draining node... (drain ran as expected)
Agent logs: Sending SIGTERM to k3s... / Waiting for k3s to close... / Bye! (no drain, clean exit)
Cluster restarted cleanly, both nodes back to Ready

3 servers + 2 agents:

All 5 nodes Ready on create
Both agents: zero localhost:8080 errors
Cluster stop: all 3 servers drained, both agents SIGTERM-only
Cluster restart: all 5 nodes back to Ready

Agent nodes don't have a kubeconfig at the default path, so the kubectl uncordon/drain calls added in k3d-io#1119 fail in an infinite retry loop, spamming logs with localhost:8080 connection refused errors. Gate drain/uncordon on K3S_ROLE=server so agents get clean SIGTERM forwarding only. Match server explicitly rather than excluding agent so unknown values fall through to the safe default. Also captures into K3S_ROLE before defining cleanup(), since inside a function refers to the function's args, not the script's (would crash under set -o nounset on shutdown). Fixes k3d-io#1420, k3d-io#1535

Copilot

Pull request overview

Updates the embedded k3d-entrypoint.sh to avoid running kubectl uncordon/kubectl drain on k3s agent nodes, preventing infinite kubectl retry spam when agents lack a usable kubeconfig.

Changes:

Capture the initial k3s subcommand ($1) into a script-scope K3S_ROLE to keep trap/cleanup logic working under set -o nounset.
Gate kubectl uncordon (startup) and kubectl drain (shutdown) so they only run when K3S_ROLE="server".
Keep agent shutdown behavior to SIGTERM forwarding + wait only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dpritchett · 2026-02-16T15:25:06Z

@iwilltry42 anything in particular I can do to help triage this one? I appreciate that you're likely to be plenty busy on other projects already.

daxmc99 · 2026-03-23T23:36:15Z

👀
Also hit this today

iwilltry42 · 2026-03-24T05:26:46Z

Finally merged. Sorry for the long wait and thanks for your contribution @dpritchett !

iwilltry42 requested a review from Copilot February 11, 2026 21:23

Copilot started reviewing on behalf of iwilltry42 February 11, 2026 21:23 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

iwilltry42 approved these changes Feb 16, 2026

View reviewed changes

iwilltry42 merged commit 2e015b3 into k3d-io:main Mar 24, 2026
10 checks passed

dpritchett deleted the fix/agent-entrypoint-no-drain branch March 24, 2026 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(entrypoint): skip drain/uncordon on agent nodes#1648

fix(entrypoint): skip drain/uncordon on agent nodes#1648
iwilltry42 merged 1 commit intok3d-io:mainfrom
dpritchett:fix/agent-entrypoint-no-drain

dpritchett commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

dpritchett commented Feb 16, 2026

Uh oh!

daxmc99 commented Mar 23, 2026

Uh oh!

Uh oh!

iwilltry42 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

dpritchett commented Feb 11, 2026

What

Why

Implications

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

dpritchett commented Feb 16, 2026

Uh oh!

daxmc99 commented Mar 23, 2026

Uh oh!

Uh oh!

iwilltry42 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants