Skip to content

pkg/dial/resolver: Fix KVStore service resolution on initialisation#44653

Merged
giorio94 merged 2 commits intocilium:mainfrom
DataDog:alex.melhem/fix-kvstore-resolution
Mar 12, 2026
Merged

pkg/dial/resolver: Fix KVStore service resolution on initialisation#44653
giorio94 merged 2 commits intocilium:mainfrom
DataDog:alex.melhem/fix-kvstore-resolution

Conversation

@41ks
Copy link
Copy Markdown
Contributor

@41ks 41ks commented Mar 6, 2026

This fixes a bug where the agent fails to initialize as it reaches a deadlock when trying to resolve the address of the KVStore if it is behind a K8s Service. The fix adds a clear dependency in the resolver on the K8s reflectors to force correct initialisation order.

Fixes: #44527

Fixes issue where the Cilium agent fails to initialise when using KVStore identity mode with etcd behind a K8s Service

@41ks 41ks requested review from a team as code owners March 6, 2026 13:22
@41ks 41ks requested a review from joamaki March 6, 2026 13:22
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 6, 2026
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 6, 2026
@41ks
Copy link
Copy Markdown
Contributor Author

41ks commented Mar 6, 2026

Hey @joamaki, I took a go at implementing your suggestion for a fix for this issue and this seems to work. Let me know if this is what you had in mind.

@HadrienPatte
Copy link
Copy Markdown
Member

/test

@joamaki
Copy link
Copy Markdown
Contributor

joamaki commented Mar 9, 2026

Bunch of privileged tests are failing:

    logger.go:256: time=2026-03-06T14:57:30.200Z level=ERROR source=/host/vendor/github.com/cilium/hive/cell/invoke.go:52 msg="Invoke failed" error="missing type:\n\t- reflectors.K8sReflectorRegistered (did you mean to Provide it?)" function="clustermesh.registerServicesInitialized (pkg/clustermesh/service_merger.go:44)"
    cell_test.go:300: 
        	Error Trace:	/host/pkg/datapath/linux/ipsec/cell_test.go:300
        	            				/host/pkg/netns/netns_linux.go:175
        	            				/host/vendor/golang.org/x/sync/errgroup/errgroup.go:93
        	            				/root/sdk/go1.26.0/src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to populate object graph: missing type: reflectors.K8sReflectorRegistered
        	Test:       	TestPrivileged_TestIPSecCell/IPSecDisabled
--- FAIL: TestPrivileged_TestIPSecCell/IPSecDisabled (0.01s)
--- FAIL: TestPrivileged_TestIPSecCell (0.04s)

https://github.com/cilium/cilium/actions/runs/22767890933/job/66040826515

@41ks
Copy link
Copy Markdown
Contributor Author

41ks commented Mar 9, 2026

Hey @joamaki, thank you for the review ! I think I need to tweak the privileged tests to provide the reflectors.K8sReflectorRegistered type or the cell altogether.

@41ks 41ks force-pushed the alex.melhem/fix-kvstore-resolution branch from 9c64831 to 60a919b Compare March 9, 2026 10:30
@41ks 41ks requested review from a team as code owners March 9, 2026 10:30
@41ks 41ks requested a review from rgo3 March 9, 2026 10:30
@HadrienPatte
Copy link
Copy Markdown
Member

/test

1 similar comment
@HadrienPatte
Copy link
Copy Markdown
Member

/test

This fixes a bug where the agent fails to initialize as it reaches a deadlock when trying to resolve the address of the KVStore if it is behind a K8s Service. The fix adds a clear dependency in the resolver on the K8s reflectors to force correct initialisation order.

Fixes: cilium#44527

Signed-off-by: Alex Melhem <alex.melhem@datadoghq.com>
@41ks 41ks force-pushed the alex.melhem/fix-kvstore-resolution branch 3 times, most recently from 32b2487 to 02650b1 Compare March 9, 2026 16:35
@HadrienPatte
Copy link
Copy Markdown
Member

/test

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 9, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 9, 2026
@41ks
Copy link
Copy Markdown
Contributor Author

41ks commented Mar 11, 2026

@joamaki @giorio94 let me know if the new changes to the resolver look good to you.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 11, 2026
Copy link
Copy Markdown
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The changes look good to me, with only a couple of minor comments inline.

Adds a timeout and a fallback when waiting for the frontends table to initialize. The function falls back to quertying the k8s apiserver.

Signed-off-by: Alex Melhem <alex.melhem@datadoghq.com>
@41ks 41ks force-pushed the alex.melhem/fix-kvstore-resolution branch from 02650b1 to 34b554b Compare March 11, 2026 14:00
@HadrienPatte HadrienPatte added the needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch label Mar 11, 2026
@HadrienPatte
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@HadrienPatte
Copy link
Copy Markdown
Member

/test

@giorio94 giorio94 added this pull request to the merge queue Mar 12, 2026
Merged via the queue into cilium:main with commit 3c44f26 Mar 12, 2026
79 checks passed
@HadrienPatte HadrienPatte deleted the alex.melhem/fix-kvstore-resolution branch March 12, 2026 17:18
@smagnani96 smagnani96 mentioned this pull request Mar 16, 2026
10 tasks
@smagnani96 smagnani96 added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 16, 2026
@github-actions github-actions bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cilium 1.19 in kvstore identity mode fails when etcd is exposed behind a service

8 participants