fix: handle ALREADY_EXISTS in TagKey and TagValue controllers#6890
fix: handle ALREADY_EXISTS in TagKey and TagValue controllers#6890afarbos wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
1148aa8 to
ec3b332
Compare
87d2dfe to
6c15039
Compare
|
Fixes #6942 |
|
Acknowledged. Updating the PR to include 'Fixes #6942'. |
codebot-robot
left a comment
There was a problem hiding this comment.
SUMMARY OF CHANGES:
- Updated commit message to include 'Fixes #6942' in the body.
- Updated PR description (locally, though push to original branch failed, changes are reflected in my fork).
- Verified the fix with existing unit tests and E2E fixtures.
- The code correctly handles ALREADY_EXISTS by acquiring the existing TagKey/TagValue resource, matching the pattern used in other direct controllers.
codebot-robot
left a comment
There was a problem hiding this comment.
I have updated the commit message to include 'Fixes #6942' as requested.
|
Superseded by #6943 |
--- INVESTIGATION REPORT ---
|
|
--- INVESTIGATION REPORT ---
Pushed updated changes to #6943. |
|
--- INVESTIGATION REPORT --- 1. Tags (TagKey/TagValue) Acquisition Failures (Run ID: 22803199004)
2. Alloydb Failure (Run ID: 22783928237)
3. Smoketest Failure (Run ID: 22783928237)
Current Status: |
|
--- INVESTIGATION REPORT --- 1. Tags (TagKey/TagValue) Acquisition Failures (Run ID: 22803199004)
2. Alloydb Failure (Run ID: 22783928237)
3. Smoketest Failure (Run ID: 22783928237)
Current Status: |
This PR handles the ALREADY_EXISTS error in TagKey and TagValue controllers by acquiring the existing resource. It incorporates changes from #6890 by @afarbos, updated with: - Latest master rebase. - MockGCP normalization for TagKey/TagValue. - E2E tests for acquisition scenarios. Fixes #6942 Supersedes #6890
fix: handle ALREADY_EXISTS in TagKey and TagValue controllers
Fixes #
WHY do we need this change?
When running KCC across multiple clusters that manage the same GCP project, creating a
TagsTagKeyorTagsTagValuefrom a second cluster fails withALREADY_EXISTSbecause the resource was already created by the first cluster. The controllers currently propagate this error instead of acquiring the existing resource.This is a common scenario in multi-cluster setups where shared GCP-global resources (tag keys and values are unique per parent project/org) are declared in each cluster's configuration. Cluster 1 succeeds, but clusters 2-N fail permanently with:
The only workaround today is to look up the server-generated numeric
resourceIDfrom GCP and set it inspec.resourceIDacross all clusters, which is operationally painful and not easily automatable (not with KCC at least).Special notes for your reviewer:
IsAlreadyExistshelper inmaputils.gochecks both HTTP 409 and gRPCAlreadyExists(code 6) since the tag controllers use REST clients but the error could surface from either transport.CreateTagKey()/CreateTagValue()andop.Wait().findTagKeyByShortName/findTagValueByShortNamefollows the same pattern asTagsTagBindingAdapter.Find()intagbinding_controller.go.Alternatives considered
Add
tagValueNamespacedNameto the TagBinding spec — The GCP API supports referencing tag values by human-readable namespaced name ({parentId}/{keyShortName}/{valueShortName}) viatagValues.getNamespaced, which would let users skip creating TagKey/TagValue in secondary clusters entirely. However, this adds a new API surface field to the CRD, has no precedent in KCC (no resource currently accepts a GCP namespaced name as spec input), and the mapper already marksTagValueNamespacedNameas intentionallyMISSING. This seems it would require a broader design discussion.Handle ALREADY_EXISTS in Create() and acquire the existing resource (this PR) — When
CreateTagKeyorCreateTagValuereturnsALREADY_EXISTS, catch the error, list existing resources under the parent, find the one matching the desiredshortName, and adopt it by settingspec.resourceIDand updating status. This follows the same pattern already used by the AssetFeed controller (pkg/controller/direct/asset/feed_controller.go).We chose approach 2 because it follows an established pattern in the codebase, requires no CRD/API changes, and fixes the problem for all multi-cluster users transparently.
Does this PR add something which needs to be 'release noted'?
Additional documentation e.g., references, usage docs, etc.:
Intended Milestone
Please indicate the intended milestone.
Tests you have done
make ready-prto ensure this PR is ready for review.