fix: handle ALREADY_EXISTS in TagKey and TagValue controllers#6943
Conversation
…le values Narrowed .etag, .createTime, and .updateTime normalization to only affect cloudresourcemanager v3 APIs (Tags, operations) and skip already stable dummy values like 'abcdef0123A='. This fixes regressions in unrelated tests (e.g. Cloud Run) that were broken by global replacement of common dummy etags.
…s to avoid global log corruption
|
--- INVESTIGATION REPORT ---
|
|
--- INVESTIGATION REPORT --- 1. Tags (TagKey/TagValue) Acquisition Failures (Run ID: 22803199004)
2. Alloydb Failure (Run ID: 22783928237)
3. Smoketest Failure (Run ID: 22783928237)
Current Status: |
cheftako
left a comment
There was a problem hiding this comment.
Couple of things to look at
| } | ||
|
|
||
| apiError := &apierror.APIError{} | ||
| return errors.As(err, &apiError) && apiError.HTTPCode() == 409 || |
There was a problem hiding this comment.
Operator-precedence trap in IsAlreadyExists —
return errors.As(err, &apiError) && apiError.HTTPCode() == 409 ||
grpcStatus.Code(err) == grpcCode.AlreadyExists
Due to && binding tighter than ||, the logic is actually:
return (errors.As(err, &apiError) && apiError.HTTPCode() == 409) ||
grpcStatus.Code(err) == grpcCode.AlreadyExists
That happens to be the intended semantics, but it's a readability trap. More
importantly, the sibling HasHTTPCode uses a guarded if errors.As(...) block,
which is the established pattern in this file. The new function is
inconsistent. Add parentheses and align the pattern with HasHTTPCode for
clarity.
Also: HasHTTPCode has a secondary fallback for HTTPCode() == -1, which
IsAlreadyExists omits without explanation.
| name: "wrapped gRPC AlreadyExists", | ||
| err: fmt.Errorf("creating resource: %w", status.Error(codes.AlreadyExists, "already exists")), | ||
| want: true, | ||
| }, |
There was a problem hiding this comment.
No test for HTTP 409 REST path —
TestIsAlreadyExists covers gRPC paths but not the REST/HTTP 409 path directly.
Most GCP REST-transport clients return an *apierror.APIError with HTTPCode()
== 409 and no gRPC code — the primary production path for many resources. This
should be covered.
| func (a *TagsTagKeyAdapter) acquireExistingTagKey(ctx context.Context, createOp *directbase.CreateOperation) error { | ||
| log := klog.FromContext(ctx) | ||
|
|
||
| existing, err := a.findTagKeyByShortName(ctx) |
There was a problem hiding this comment.
TOCTOU race in acquire logic —
The acquire sequence is:
- CreateTagKey → ALREADY_EXISTS
- findTagKeyByShortName → list + search
Between steps 1 and 2, the resource could be deleted by another process,
causing findTagKeyByShortName to return nil and leaving the KCC object in a
permanently-errored state. Per standard Kubernetes controller patterns, this
should return a transient/retryable error rather than a fatal one, so the
controller runtime will re-queue.
| if service == "" { | ||
| service = "cloudresourcemanager.googleapis.com" | ||
| } | ||
| return "//" + service + "/" + strings.Join(tokens, "/"), nil |
There was a problem hiding this comment.
normalizeParent behavioral change —
Before this PR, normalizeParent("projects/myproject") returned
"projects/123456" (bare path). After, it returns
"//cloudresourcemanager.googleapis.com/projects/123456". Existing TagBinding
objects in the mock stored with bare paths would fail the equality check in
ListTagBindings. The controller code appears to always pass //-prefixed
resource links so it may be fine in practice, but this should be documented
and the expected input format of normalizeParent should be clearly stated.
| path := parent | ||
| if strings.HasPrefix(parent, "//") { | ||
| parts := strings.SplitN(parent[2:], "/", 2) | ||
| if len(parts) == 2 { |
There was a problem hiding this comment.
normalizeParent malformed input silently falls through —
mockgcp/mockresourcemanager/tagbinding.go (Low)
if strings.HasPrefix(parent, "//") {
parts := strings.SplitN(parent[2:], "/", 2)
if len(parts) == 2 {
service = parts[0]
path = parts[1]
}
}
A malformed input like "//cloudresourcemanager.googleapis.com" (no path)
produces len(parts) == 1, so service stays "" and path stays as the original
value — leading to a silent wrong tokenization. The mock should at least log
or return an error for malformed input.
|
@codebot-robot could you address the comments? |
|
@fkc1e100 Do you know how to address the comments here? |
cheftako
left a comment
There was a problem hiding this comment.
3 outstanding comments. I think best way to proceed is in a follow up.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
05d0707
This PR handles the ALREADY_EXISTS error in TagKey and TagValue controllers by acquiring the existing resource.
It incorporates changes from #6890 by @afarbos, updated with:
Fixes #6942
Supersedes #6890