Skip to content

feat: Introduce kube-proxy replacement configuration and related functionality#5

Open
berkayoz wants to merge 23 commits into
mainfrom
KU-4778/kube-proxy-replacement
Open

feat: Introduce kube-proxy replacement configuration and related functionality#5
berkayoz wants to merge 23 commits into
mainfrom
KU-4778/kube-proxy-replacement

Conversation

@berkayoz

Copy link
Copy Markdown
Member

No description provided.

@berkayoz berkayoz force-pushed the KU-4778/kube-proxy-replacement branch 5 times, most recently from 1582ca4 to 459e567 Compare January 29, 2026 07:25
@ktsakalozos-canonical ktsakalozos-canonical force-pushed the KU-4778/kube-proxy-replacement branch 3 times, most recently from e66c896 to 56ccc7c Compare May 28, 2026 11:02
@ktsakalozos-canonical ktsakalozos-canonical force-pushed the KU-4778/kube-proxy-replacement branch from 982264f to 4f75852 Compare June 9, 2026 08:11
berkayoz and others added 5 commits June 9, 2026 11:12
- Added NodeConfig struct to encapsulate Kubelet and Network configurations.
- Implemented ToConfigMap and FromConfigMap methods for NodeConfig to handle serialization and deserialization.
- Updated tests to cover new NodeConfig functionality, including signing and verification.
- Modified existing Kubelet configuration handling to support the new NodeConfig structure.
- Enhanced validation logic to ensure kube-proxy-free cannot be set to false when the network is enabled.
- Updated cluster configuration defaults to include kube-proxy-free setting.
Signed-off-by: Homayoon (Hue) Alimohammadi <homayoon.alimohammadi@canonical.com>
@ktsakalozos-canonical ktsakalozos-canonical force-pushed the KU-4778/kube-proxy-replacement branch from 4f75852 to c2fa6c4 Compare June 9, 2026 08:20
@ktsakalozos-canonical ktsakalozos-canonical force-pushed the KU-4778/kube-proxy-replacement branch from c2fa6c4 to 6395706 Compare June 9, 2026 08:28
@ktsakalozos-canonical ktsakalozos-canonical changed the title feat: Introduce kube-proxy-free configuration and related functionality feat: Introduce kube-proxy replacement configuration and related functionality Jun 9, 2026
@ktsakalozos-canonical ktsakalozos-canonical marked this pull request as ready for review June 9, 2026 08:31
Copilot AI review requested due to automatic review settings June 9, 2026 08:31

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a kube-proxy replacement configuration knob and migrates service management toward a persisted per-node “local state” file, enabling controllers/hooks to start/stop (notably kube-proxy) based on reconciled configuration.

Changes:

  • Add network.kube-proxy-enabled to cluster config (defaults/merge/validate/CLI/API) and propagate it to nodes via a signed ConfigMap.
  • Introduce local-state.yaml to persist enabled/disabled snap services and use it to start/restart/stop services from hooks/controllers.
  • Update Cilium network install values to enable kube-proxy replacement mode and adjust health-check settings.

Reviewed changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/snap/util/services.go Removed legacy fixed service lists and wrappers.
pkg/snap/util/services_test.go Removed tests for legacy service wrappers.
pkg/snap/util/local_state.go Added local state model + read/write + start/restart/stop helpers.
pkg/snap/util/local_state_test.go Added unit tests for local state behavior.
pkg/snap/snap.go Added LocalStatePath() implementation.
pkg/snap/mock/mock.go Added mock support for LocalStatePath().
pkg/snap/interface.go Extended Snap interface with LocalStatePath().
pkg/k8sd/types/cluster_config_validate.go Added validation around kube-proxy-enabled vs network enabled.
pkg/k8sd/types/cluster_config_validate_test.go Added tests for the new validation rule.
pkg/k8sd/types/cluster_config_node.go Added ClusterConfig↔ConfigMap conversion with signing and kube-proxy-enabled support.
pkg/k8sd/types/cluster_config_node_test.go Added tests for configmap conversion/signing/round-trips.
pkg/k8sd/types/cluster_config_network.go Added KubeProxyEnabled field + accessor.
pkg/k8sd/types/cluster_config_merge.go Enabled merging of Network.KubeProxyEnabled.
pkg/k8sd/types/cluster_config_merge_test.go Updated merge tests to include kube-proxy-enabled scenarios.
pkg/k8sd/types/cluster_config_kubelet.go Removed kubelet-only configmap/signing logic (moved to node config).
pkg/k8sd/types/cluster_config_kubelet_test.go Removed tests for kubelet-only configmap/signing.
pkg/k8sd/types/cluster_config_defaults.go Added defaulting for Network.KubeProxyEnabled.
pkg/k8sd/types/cluster_config_defaults_test.go Updated defaults tests for kube-proxy-enabled default behavior.
pkg/k8sd/types/cluster_config_convert.go Wired kube-proxy-enabled through user-facing conversions.
pkg/k8sd/types/cluster_config_convert_test.go Updated conversion tests for kube-proxy-enabled.
pkg/k8sd/features/cilium/network.go Enabled kube-proxy replacement in Cilium values and triggers a rollout restart.
pkg/k8sd/controllers/update_node_configuration.go Switched to ClusterConfig→ConfigMap for node config propagation.
pkg/k8sd/controllers/update_node_configuration_test.go Updated tests for new node configmap conversion.
pkg/k8sd/controllers/service_args.go Service list selection now derived from local state.
pkg/k8sd/controllers/node_configuration.go Node controller now handles kube-proxy enable/disable + persists it to local state.
pkg/k8sd/controllers/node_configuration_test.go Added kube-proxy-enabled reconciliation tests + local state setup.
pkg/k8sd/app/hooks_start.go Start hook now prefers local state and performs legacy migration to local state.
pkg/k8sd/app/hooks_remove.go Remove hook uses new StopAllK8sServices and deletes local state.
pkg/k8sd/app/hooks_node_ready.go Triggers update-node-config reconciliation on node-ready.
pkg/k8sd/app/hooks_join.go Writes local state on join and starts services via local state.
pkg/k8sd/app/hooks_bootstrap.go Writes local state on bootstrap and starts services via local state.
pkg/k8sd/app/cluster_util.go Removed legacy startControlPlaneServices helper.
pkg/k8sd/api/worker.go Worker join info now includes kube-proxy-enabled.
pkg/k8sd/api/certificates_update.go Restart control-plane services via local state.
pkg/k8sd/api/certificates_refresh.go Restart control-plane services via local state.
go.mod Bumped github.com/canonical/k8s-snap-api/v2 to v2.1.0.
go.sum Updated sums for k8s-snap-api/v2 v2.1.0.
cmd/k8s/testdata/bootstrap-config-full.yaml Added kube-proxy-enabled to example bootstrap config.
cmd/k8s/k8s_set.go Allowed setting network.kube-proxy-enabled.
cmd/k8s/k8s_set_test.go Added mapstructure tests for network.kube-proxy-enabled.
cmd/k8s/k8s_get.go Added get output for network.kube-proxy-enabled.
cmd/k8s/k8s_enable.go Network enable now sets kube-proxy-enabled default false.
cmd/k8s/k8s_bootstrap.go Bootstrap defaults include kube-proxy-enabled false when enabling network.
cmd/k8s/k8s_bootstrap_test.go Updated bootstrap CLI tests for kube-proxy-enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/snap/util/local_state.go
Comment thread pkg/k8sd/controllers/service_args.go
Comment thread pkg/k8sd/app/hooks_bootstrap.go Outdated
Comment thread pkg/k8sd/app/hooks_start.go
Comment thread pkg/snap/util/local_state_test.go
Comment thread pkg/k8sd/types/cluster_config_defaults.go
Comment on lines +65 to +71
// Network fields
v := config.Network.KubeProxyEnabled
if v == nil {
data["kube-proxy-enabled"] = fmt.Sprintf("%t", true)
} else {
data["kube-proxy-enabled"] = fmt.Sprintf("%t", *v)
}
Comment on lines +131 to +139
// Parse Network fields
if v, ok := m["kube-proxy-enabled"]; ok {
kubeProxyEnabled := v == "true"
config.Network.KubeProxyEnabled = &kubeProxyEnabled
} else {
// Default to true if not specified
kubeProxyEnabled := true
config.Network.KubeProxyEnabled = &kubeProxyEnabled
}
Comment thread pkg/k8sd/types/cluster_config_node_test.go

@berkayoz berkayoz left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this over the finish line, a few comments

Comment thread cmd/k8s/k8s_get.go
Comment on lines +59 to +63
if config.Network.KubeProxyEnabled == nil {
// if kube-proxy-enabled is nil do not show the nil value.
kubeProxyEnabled := config.Network.GetKubeProxyEnabled()
config.Network.KubeProxyEnabled = &kubeProxyEnabled
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed, don't we already ensure config.Network.GetKubeProxyEnabled() doesn't return nil? https://github.com/canonical/k8s-snap-api/blob/777922c7fa9b77c2936f3fddea530499888da72c/api/type_cluster_config.go#L153-L155

Below on the output assignment this should be fine already no?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config as returned in the response can have nil at the KubeProxyEnabled filed (we have removed the omit empty in the k8s-snap-api). If the user asks for the entire config with k8s get or just the network section with k8s get network the config is returned as is. We do not call the config.Network.GetKubeProxyEnabled() anywhere below so the nil value is shown to the end user. We do not want that. We want to make sure that if there is a nil value it is shown as "true".

Comment thread pkg/k8sd/app/hooks_bootstrap.go Outdated
Comment thread pkg/k8sd/app/hooks_remove.go Outdated
log.Info("Have kube-proxy start and marked as active")
if err := snap.StartServices(ctx, []string{"kube-proxy"}); err != nil {
log.Error(err, "failed to start kube-proxy service")
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A removed node should return to the same state as what we have after snap install right? Why would we need to re-enable kube-proxy if its not setup yet, we will be missing the files and everything no?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this test test_skip_services_stop_on_remove in the k8s-snap that expects kube-proxy to be enabled. I removed this part of the code from here and will handle the specifics of the test in the k8s-snap.

Comment thread pkg/k8sd/controllers/service_args.go Outdated
localState, err := snaputil.ReadLocalState(c.snap)
if err != nil {
return fmt.Errorf("failed to determine node type: %w", err)
// Local state file doesn't exist, use old behavior

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we use the old behavior here?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. There is no way to fallback to the old behavior. The respective functions have been removed. I revised the code so if the local state file is not present it fails with a message essentially waiting for the state file to be created.

ktsakalozos-canonical and others added 2 commits June 9, 2026 12:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 45 changed files in this pull request and generated 6 comments.

Comment thread pkg/k8sd/controllers/service_args.go
Comment thread pkg/k8sd/app/hooks_join.go
Comment thread pkg/snap/util/local_state_test.go
Comment thread pkg/k8sd/app/hooks_start.go Outdated
Comment thread pkg/k8sd/app/hooks_remove.go Outdated
Comment thread pkg/k8sd/types/cluster_config_node.go
ktsakalozos and others added 5 commits June 9, 2026 17:54
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
HomayoonAlimohammadi

This comment was marked as low quality.

@HomayoonAlimohammadi HomayoonAlimohammadi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: solid for fresh clusters, but upgrade/mixed-version paths land in the kube-proxy + KPR conflict state, every k8sd restart rollout-restarts Cilium cluster-wide, and the local-state not-exist check is dead code.
This review was generated by Claude 5 Fable (xhigh) after consuming around 80k tokens. The quality however was SO low in the end that I'd rather resolve everything. It had all the context it needed and the prompt was fairly comprehensive, trust me.

Comment thread pkg/k8sd/features/cilium/network.go
Comment thread pkg/k8sd/features/cilium/network.go
Comment thread pkg/k8sd/controllers/service_args.go
Comment thread pkg/k8sd/app/hooks_bootstrap.go
Comment thread pkg/k8sd/app/hooks_start.go
Comment thread pkg/k8sd/api/certificates_refresh.go
Comment thread pkg/snap/util/local_state.go
Comment thread pkg/k8sd/types/cluster_config_validate.go
Comment thread pkg/k8sd/controllers/node_configuration.go

@HomayoonAlimohammadi HomayoonAlimohammadi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot to both of you for this PR!
However, I'd lean towards leaving the local state out for now, until we discuss it further. I tried to explain why I think we're better off leaving it out, but TLDR: What we're storing there is not really "local" state (it's cluster wide); Compared to what this PR should entail that's a significant change (a lot of the changed lines are related to local state); I believe it needs further discussion and polishing.

If the team's decision is to keep the current local state impl, I'm all for it, but I'd personally vote against it for now. Not trying to block change behind annoying specs and buy trouble, but I think this at least deserves a short engineering discussion.


restartFn := func(ctx context.Context) error {
if err := snaputil.RestartControlPlaneServices(ctx, snap); err != nil {
localState, err := snaputil.ReadLocalState(snap)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I understand. we want to keep track of whether or not kubeproxy is disabled.
instead of introducing this "local state" which to me looks like a significant addition, and might require a separate thread (discussion, alternative solutions), can't we just pass a kubeProxyEnabled bool param to the StartControlPlaneServices function?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how this local state can be utilized. I see the value in it, but I'm not convinced that this is the correct way to go.
to me, what we're storing in the local state file, is not REALLY a LOCAL state. because it applies to all the nodes in the cluster basically. so we might as well store it in the cluster config and include some other stuff in that file that are actually local.
the point is that this is a thing that we need to think about and discuss a bit more. I'd lean towards leaving it out and working around it for now.
This will also make this PR leaner, more focused and prevents scope creep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants