kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join' by SataQiu · Pull Request #136014 · kubernetes/kubernetes

SataQiu · 2026-01-04T07:03:29Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

kubeadm: verify the etcd pod is running before promoting learner

Which issue(s) this PR is related to:

Fixes #kubernetes/kubeadm#3269

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2026-01-04T07:03:38Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

SataQiu · 2026-01-04T07:04:56Z

/hold

pacoxu · 2026-01-04T08:58:25Z

cc @dlipovetsky
/assign

pacoxu · 2026-01-04T09:29:04Z

+		} else {
+			klog.V(1).Infof("[etcd] Successfully removed learner member %s", etcdPeerAddress)
+		}
+		return errors.Wrap(err, "The etcd pod did not start successfully")


Like line 183: failed to remove xxx?
The error may be something like: failed to start xxx?
the learner member was removed successfully

Overall LGTM. Some logs or errors are not aligned.

/approve

k8s-ci-robot · 2026-01-04T09:29:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pacoxu, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [SataQiu,pacoxu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123

the problems CAPI users are having is that they have old, big, non-defragmented etcd DBs and when they try to kubeadm join new nodes etcd doesn't have enough time to sync the member data and kubeadm timeouts at the Etcd Client timeout we have for member promotion.

this change adds another timeout in between ControlPlaneComponentHealthCheck, but there is no guarantee if the same users will hit this timeout first. for these big DBs they must configure the timeouts. kubeadm will not block forever, obviously.

the change overall seems OK to me, but kubeadm has a history of breaking something related to the etcd logic, when trying to improve something in the etcd logic.

in terms of backport, my vote is +0.
this is somewhere in between a bugfix and feature improvement.

neolit123 · 2026-01-04T09:31:35Z

+	waiter := apiclient.NewKubeWaiter(client, kubeadmapi.GetActiveTimeouts().ControlPlaneComponentHealthCheck.Duration, os.Stdout)
+	if err := waiter.WaitForStaticPodRunning(nodeName, kubeadmconstants.Etcd); err != nil {
+		klog.V(1).Infof("[etcd] The etcd pod did not start successfully, removing learner member %s", etcdPeerAddress)
+		if removeErr := removeLearnerMember(etcdClient, etcdPeerAddress); removeErr != nil {


can this cause problems for users who were previously manually doing the remove?

This is also something I'm unsure about. Given the doubts about this, do you agree that we should just remove removeLearnerMember call and instead leave it to the users to handle? @neolit123

Even if the unstarted etcd member is not removed from etcd cluster, it will not cause any harmful effects and will not affect the subsequent retry attempts to join the node.

i think the member not being removed presents a problem for CAPI users, because CAPI might try to join a machine with a different IP and not that same machine that previously failed, so the previous member will be stale.

on the other hand users who were doing the manual cleanup might now get an error that no such member exists.

i am in favor of supporting the CAPI case, but just trying to understand if we have significant concerns about the manual case.

kubeadm: verify the etcd pod is running before promoting learner

we should add a sentence about that in the RN

If the new joining node is a fresh machine with the same IP, will that be a problem if we did not remove it. (They have to remove it manually then).

Is there a side effect to remove the stale learner member? The etcd data folder should be clean up as well IIUC.

Leaving it to users to manually is also an option. I prefer to remove it if there is no special concern.

i think i agree, the correct way seems to do this is to remove the data dir as well.
which pretty much means we need to call the code from app/cmd/phases/reset/removeetcdmember.go from the join phases, which is doable if it's exposed more like a backend, but a bit sketchy.

maybe, if that is not done correctly the partially synced state can no longer be valid. i don't know if etcd will be smart enough to re-sync the whole DB properly again or can that cause problems.

in that case we only should be removing the member and not the data.

You can't do this. After you removed a member, you can't reuse the data anymore. We discussed this in etcd community. The data should can be reused from user perspective, but there is a technical restriction for now.

ok, so in that case my original proposal for case 2 is valid.
#136014 (comment)

effectively a failed promotion for various reasons like sync issues, should result in member + it's data removed.

#136014 (comment)

effectively a failed promotion for various reasons like sync issues, should result in member + it's data removed.

If sync doesn't finish before the etcd API call timeout for learner promotion, does it make more sense to keep retrying? Otherwise, you will have to start over (re-sync the data) again.

#136014 (comment)
effectively a failed promotion for various reasons like sync issues, should result in member + it's data removed.

If sync doesn't finish before the etcd API call timeout for learner promotion, does it make more sense to keep retrying? Otherwise, you will have to start over (re-sync the data) again.

it's possible to set a custom timeout for the promotion retry, but kubeadm doesn't block indefinitely in any areas.

So, what is the conclusion? Does Kubeadm still need to handle the cleanup of folders? Case 2 seems rather challenging. It's not easy for us to determine whether the etcd data can be safely deleted. Perhaps it should be left to the user to handle. For example, performing "kubeadm reset" and then rejoining.

agreed, for case 2 we'd need to check for the sync error, and we don't know who it's going to break if we start removing the member data after the pod has started. i checked the CAPI KCP code and they at least make sure to only delete members that exist.

for case 1, i'm not convinced we need to remove the data dir. if the pod never started the data dir will be empty, right?

neolit123

/lgtm
/hold

@pacoxu any concerns about the removal of members?
#136014 (comment)

HirazawaUi · 2026-01-14T13:32:03Z

/lgtm

neolit123 · 2026-01-14T13:38:00Z

#136014 (comment)

@HirazawaUi WDYT, about the above assessment?

…ng during 'kubeadm join'

k8s-ci-robot · 2026-01-15T10:03:06Z

LGTM label has been added.

Details

Git tree hash: 09c55e2b2c63cf5afb83500e32a7b9404e3c38de

neolit123 · 2026-01-15T10:03:20Z

/retest

HirazawaUi · 2026-01-15T16:10:03Z

+func (c *Client) listMembersOnce() (*clientv3.MemberListResponse, error) {
+	cli, err := c.newEtcdClient(c.Endpoints)
+	if err != nil {
+		return nil, err
+	}
+	defer func() { _ = cli.Close() }()
+
+	ctx, cancel := context.WithTimeout(context.Background(), etcdTimeout)


Apologies, I just got off work....
There is still an issue here: we added a 2-second timeout for the request cli.MemberList(ctx) in the listMembersOnce method. However, we are still calling c.getMemberStatus(learnerID) -> c.listMembersOnce() on a 500ms cycle. This may cause requests to etcd to be blocked under certain network anomalies, and all responses could be returned together once the network recovers... I think this does not align with our expectations.

Apologies, I just got off work....

if it's too late in CN there is no rush to reply.

There is still an issue here: we added a 2-second timeout for the request cli.MemberList(ctx) in the listMembersOnce method. However, we are still calling c.getMemberStatus(learnerID) -> c.listMembersOnce() on a 500ms cycle. This may cause requests to etcd to be blocked under certain network anomalies, and all responses could be returned together once the network recovers... I think this does not align with our expectations.

i believe in many places we have a poll with the 500ms retry interval (e.g. AddMember) but the context timeout for a etcd API call is 2 seconds. i'm not sure what should we do here. is that a refactor for a separate PR?

I also noticed this "inversion" (outer interval duration is shorter than inner context deadline) during an earlier review, but I thought it would be out of scope for this PR, because it happens in various places.

i'm not sure what should we do here. is that a refactor for a separate PR?

Alright, let's do this in a separate PR

HirazawaUi · 2026-01-18T11:05:54Z

/lgtm
/hold cancel

HirazawaUi · 2026-01-18T12:07:27Z

/retest

…#136014-upstream-release-1.35 Automated cherry pick of #136014: kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'

…#136014-upstream-release-1.33 Automated cherry pick of #136014: kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'

…#136014-upstream-release-1.34 Automated cherry pick of #136014: kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'

…#136014-upstream-release-1.32 Automated cherry pick of #136014: kubeadm: waiting for etcd learner member to be started before promoting during 'kubeadm join'

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 4, 2026

k8s-ci-robot requested review from carlory and neolit123 January 4, 2026 07:03

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 4, 2026

SataQiu mentioned this pull request Jan 4, 2026

kubeadm should confirm that the new etcd peer has started before trying to promote it to a member kubernetes/kubeadm#3269

Closed

SataQiu force-pushed the fix-20260104 branch from 678c2c0 to b44e23d Compare January 4, 2026 07:14

k8s-ci-robot assigned pacoxu Jan 4, 2026

pacoxu reviewed Jan 4, 2026

View reviewed changes

neolit123 reviewed Jan 4, 2026

View reviewed changes

SataQiu force-pushed the fix-20260104 branch 4 times, most recently from 04bac7f to bc72d12 Compare January 4, 2026 12:46

neolit123 reviewed Jan 4, 2026

View reviewed changes

k8s-ci-robot assigned neolit123 Jan 4, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2026

k8s-ci-robot assigned dlipovetsky Jan 13, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2026

k8s-ci-robot assigned HirazawaUi Jan 14, 2026

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2026

k8s-ci-robot requested review from HirazawaUi, dlipovetsky and neolit123 January 14, 2026 14:03

kubeadm: waiting for etcd learner member to be started before promoti…

5e334d9

…ng during 'kubeadm join'

SataQiu force-pushed the fix-20260104 branch from 1c59e0a to 5e334d9 Compare January 15, 2026 02:58

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 15, 2026

HirazawaUi reviewed Jan 15, 2026

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2026

k8s-ci-robot merged commit 02e6747 into kubernetes:master Jan 18, 2026
13 checks passed

k8s-ci-robot added this to the v1.36 milestone Jan 18, 2026

ahrtr mentioned this pull request Feb 26, 2026

kubeadm: do not add learner member to etcd client endpoints #137251

Merged

fabriziopandini mentioned this pull request Mar 9, 2026

KCP does not remove the etcd member for a machine that failed to join the control plane kubernetes-sigs/cluster-api#13221

Closed

Conversation

SataQiu commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Jan 4, 2026

Uh oh!

SataQiu commented Jan 4, 2026

Uh oh!

pacoxu commented Jan 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jan 4, 2026

Uh oh!

neolit123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SataQiu Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SataQiu Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HirazawaUi commented Jan 14, 2026

Uh oh!

neolit123 commented Jan 14, 2026

Uh oh!

k8s-ci-robot commented Jan 15, 2026

Uh oh!

neolit123 commented Jan 15, 2026

Uh oh!

HirazawaUi Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SataQiu commented Jan 4, 2026 •

edited

Loading

SataQiu Jan 4, 2026 •

edited

Loading

neolit123 Jan 4, 2026 •

edited

Loading

neolit123 Jan 5, 2026 •

edited

Loading

SataQiu Jan 6, 2026 •

edited

Loading

neolit123 Jan 6, 2026 •

edited

Loading

neolit123 left a comment •

edited

Loading

HirazawaUi Jan 15, 2026 •

edited

Loading

neolit123 Jan 15, 2026 •

edited

Loading