Skip to content

feat: syncer integ#6919

Merged
cheftako merged 3 commits into
masterfrom
acpana/integrate-syncer
Mar 24, 2026
Merged

feat: syncer integ#6919
cheftako merged 3 commits into
masterfrom
acpana/integrate-syncer

Conversation

@acpana

@acpana acpana commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator

Adds syncer integration in operator and kccmanager

BRIEF Change description

Fixes #

WHY do we need this change?

Special notes for your reviewer:

Does this PR add something which needs to be 'release noted'?

`spec.experiments.MultiClusterLeaseSpec ` now supports integration with a syncer for KRM objects. This will help KCC take ownership of resources with service generated IDs.
  • Reviewer reviewed release note.

Additional documentation e.g., references, usage docs, etc.:


Intended Milestone

Please indicate the intended milestone.

  • Reviewer tagged PR with the actual milestone.

Tests you have done

  • Run make ready-pr to ensure this PR is ready for review.
  • Perform necessary E2E testing for changed resources.

@acpana acpana force-pushed the acpana/integrate-syncer branch from fe8899f to 34f198f Compare March 5, 2026 00:36
@acpana acpana marked this pull request as ready for review March 5, 2026 23:08
Comment thread pkg/controller/kccmanager/syncer_integration.go Outdated
Comment thread pkg/controller/kccmanager/syncer_integration.go Outdated
@acpana acpana force-pushed the acpana/integrate-syncer branch 2 times, most recently from 3fcd5b9 to c508008 Compare March 10, 2026 02:18
@acpana acpana added this to the 1.147 milestone Mar 10, 2026
@acpana acpana requested a review from cheftako March 10, 2026 04:12
@acpana acpana force-pushed the acpana/integrate-syncer branch 2 times, most recently from 1851051 to 6604330 Compare March 10, 2026 20:44
Comment thread cmd/manager/main.go Outdated
Comment thread pkg/controller/kccmanager/kccmanager.go
Comment thread pkg/controller/kccmanager/syncer_integration.go Outdated
@acpana acpana force-pushed the acpana/integrate-syncer branch 3 times, most recently from 9e2d975 to 34a1907 Compare March 11, 2026 23:40
@acpana acpana marked this pull request as draft March 16, 2026 20:12
@acpana acpana force-pushed the acpana/integrate-syncer branch 4 times, most recently from df99c9c to c66b9fc Compare March 17, 2026 00:02
@acpana acpana mentioned this pull request Mar 17, 2026
@xiaoweim xiaoweim modified the milestones: 1.147, 1.148 Mar 18, 2026
acpana added 2 commits March 18, 2026 20:53
Signed-off-by: Alex Pana <8968914+acpana@users.noreply.github.com>
Signed-off-by: Alex Pana <8968914+acpana@users.noreply.github.com>
@acpana acpana force-pushed the acpana/integrate-syncer branch 4 times, most recently from d7eec8a to 98c0d89 Compare March 18, 2026 23:21
@acpana acpana marked this pull request as ready for review March 18, 2026 23:21
Comment thread operator/pkg/controllers/configconnector/experiments.go Outdated
Comment thread operator/pkg/apis/core/v1beta1/configconnector_types.go Outdated
Comment thread pkg/controller/kccmanager/kccmanager.go
Comment thread pkg/controller/kccmanager/kccmanager.go Outdated
Comment thread pkg/controller/kccmanager/syncer_integration.go Outdated
Comment thread pkg/controller/kccmanager/kccmanager.go
@@ -153,7 +157,7 @@ func setUpMultiClusterLease(ctx context.Context, restConfig *rest.Config, scheme
// Create a new client for the lock to ensure it uses the correct configuration

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGENT: If Config Connector is running in namespace mode (multiple controller instances), they will all retrieve the exact same cluster-scoped ConfigConnector object (configconnector.core.cnrm.cloud.google.com) and attempt to acquire the exact same MultiClusterLease lock (LeaseName and Namespace).

This will cause all but one namespace controller across the entire cluster to be blocked as followers, effectively breaking namespace mode. Leader election leases must be distinct per KCC manager instance (e.g., by incorporating the manager's target namespace into the lease name).

TAKO: Ideally (at lest for V1) the operator would acquire/renew the lock and the individual managers would validate the lock was held by "them" (same ID). Having said that as long as all the managers in a cluster are using the same ID they should just be updating the lease time and renewing without a problem. One thought on this is given 10K namespaces, we get 10K manager. Renewing the lease every minute, means we are sending 170 QPS writes at the KAS, just for lease renewals...

We should also make sure our testing covers the multiple manager case. We should have a test for namespace mode, 2+ namespaces configured to reconcile and MCL and sync and ensure that 1 cluster is actually active and that the other cluster is seeing syncing in both namespaces.

@acpana acpana Mar 19, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renewing the lease every minute, means we are sending 170 QPS writes at the KAS, just for lease renewals...

I'm curious if this would be something we could live with for V1? I think having each manager do this call is not ideal but keeping the call in the manager layer let's us keep the door open for namespace sharding too but YAGNI!

Another trick I was thinking we could implement was to designate on of the kccmanagers as the "lease leader" (maybe first one to wake up and announce itself to the operator).

We should also make sure our testing covers the multiple manager case. We should have a test for namespace mode, 2+ namespaces configured to reconcile and MCL and sync and ensure that 1 cluster is actually active and that the other cluster is seeing syncing in both namespaces.

I think this is covered in #7053

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this would be something we could live with for V1?

Probably but I would suggest you should have a plan for monitoring and for mitigating. They should probably go in your feature doc. I would say mitigate is probably change the lease period. Ten minutes rather than one gets you down to 17 QPS. Do you have a way to tell?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a way to tell?

I think we can leverage KAS audit logs and I can add some more metrics as a follow on PR to look specifically for the syncer_integration.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit isn't always turned on. Also requires a fair amount of parsing code to determine. Metrics seem like a good option. Tricky part with metrics, is that from the client side you need to aggrege a lot of different processes metrics together to get the answer.

Comment thread pkg/controller/kccmanager/syncer_integration.go Outdated
Comment thread cmd/manager/main.go Outdated
Comment thread pkg/controller/kccmanager/kccmanager.go
@acpana acpana force-pushed the acpana/integrate-syncer branch from 98c0d89 to 2869583 Compare March 20, 2026 00:46
@acpana acpana requested a review from cheftako March 20, 2026 00:48

@cheftako cheftako left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make on shutdown not block exit?

@cheftako cheftako left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex!

@cheftako cheftako added this pull request to the merge queue Mar 24, 2026
@google-oss-prow google-oss-prow Bot added the lgtm label Mar 24, 2026
@google-oss-prow

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheftako

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Merged via the queue into master with commit 6e483b6 Mar 24, 2026
166 checks passed
@cheftako cheftako deleted the acpana/integrate-syncer branch April 1, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants