Skip to content

fix: Added check to prevent CP from joining with same name as worker node.#2263

Merged
ethandcosta merged 13 commits into
mainfrom
KU-3296/no-dupe-node-names
Feb 11, 2026
Merged

fix: Added check to prevent CP from joining with same name as worker node.#2263
ethandcosta merged 13 commits into
mainfrom
KU-3296/no-dupe-node-names

Conversation

@ethandcosta

Copy link
Copy Markdown
Contributor

Currently, if a CP node attempts to join the cluster with the same name as an existing worker node, it will fail in Kubernetes but persist in the microcluster, creating a mismatch in data. This PR addresses this issue by adding a check in the join logic to kill a CP join if there's a worker with the same name.

@louiseschmidtgen louiseschmidtgen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up Ethan!

Could I ask you to create an integration test as well please?
This way we can test the actual hook.

I'm thinking 1) worker, cp and 2) cp, worker (for the later I'm not sure if the worker has permissions to get all nodes so just double check this case).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/k8s/pkg/k8sd/app/hooks_join_test.go Outdated
Comment thread src/k8s/pkg/k8sd/app/hooks_join_test.go Outdated
Comment thread src/k8s/pkg/k8sd/app/hooks_join.go Outdated
Comment thread src/k8s/pkg/k8sd/app/hooks_join.go Outdated
Comment thread src/k8s/pkg/k8sd/app/hooks_join_test.go Outdated
Comment thread src/k8s/pkg/k8sd/app/hooks_join_test.go Outdated
Comment thread tests/integration/tests/test_clustering.py Outdated
@ethandcosta

Copy link
Copy Markdown
Contributor Author

I've moved the node name duplication check into the get-join-token section instead. This won't cover the case where node A creates a join token, an identically-named node B creates another join token and joins the cluster, and then A attempts to join the cluster, as I think this will probably take some greater design reworking better suited for a follow-on task, i.e. creating an exposed endpoint for joining nodes to see who's in the cluster, and it's not a case that is likely to happen often.

@ethandcosta

Copy link
Copy Markdown
Contributor Author

edit: this PR is kind of in a limbo state because of the k8sd migration. Once this PR canonical/k8sd#4 gets merged in, then we can update the version used here and this should be good to merge.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/integration/tests/test_util/util.py
Comment thread tests/integration/tests/test_clustering.py
Comment thread tests/integration/tests/test_clustering.py

@louiseschmidtgen louiseschmidtgen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits, apologies I didn't catch the one earlier:

Comment on lines +538 to +540
f'a node with this name is already part of the cluster: "{shared_node_name}"'
in error_output
), f"Join error message should indicate duplicate node name. Got: {error_output}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I didn't catch it earlier: can we capitalize the errors in k8sd to export them. Then we can use those here instead of string messages that may become outdated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +565 to +583
try:
util.get_join_token(
cluster_node, joining_worker, "--worker", name=shared_node_name
)
assert False, "get-join-token should have failed due to duplicate node name"
except tenacity.RetryError as e:
LOG.info("get-join-token failed as expected")
cause = e.last_attempt.exception()
if not isinstance(cause, subprocess.CalledProcessError):
raise e
error_output = (
cause.stderr if cause.stderr else cause.stdout if cause.stdout else ""
)
if isinstance(error_output, bytes):
error_output = error_output.decode()
assert (
f'a node with this name is already part of the cluster: "{shared_node_name}"'
in error_output
), f"Join error message should indicate duplicate node name. Got: {error_output}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something neat we can use to clean this piece up a bit: https://docs.pytest.org/en/7.1.x/how-to/assert.html#assertions-about-expected-exceptions
pytest expected exceptions.

@ethandcosta ethandcosta merged commit bed0586 into main Feb 11, 2026
287 of 288 checks passed
@ethandcosta ethandcosta deleted the KU-3296/no-dupe-node-names branch February 11, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants