Skip to content

fix(sdk): validate labels in SetLabel to prevent retry loops#4386

Merged
markmandel merged 3 commits intoagones-dev:mainfrom
rajanarahul93:fix/sdk-setlabel-validation
Jan 8, 2026
Merged

fix(sdk): validate labels in SetLabel to prevent retry loops#4386
markmandel merged 3 commits intoagones-dev:mainfrom
rajanarahul93:fix/sdk-setlabel-validation

Conversation

@rajanarahul93
Copy link
Copy Markdown
Contributor

/kind bug

What this PR does / Why we need it:
This PR adds early validation to the SDK sidecar SetLabel RPC to prevent invalid Kubernetes labels from being enqueued and retried indefinitely.

Previously, SetLabel accepted any key/value pair and queued it for synchronization. When the label was invalid (for example, containing spaces), the Kubernetes API server rejected the update. Because the failure was deterministic, the SDK worker queue retried forever, resulting in a retry loop and blocking subsequent label updates.

This change validates label keys and values up front using Kubernetes validation rules and returns a clear InvalidArgument gRPC error to the client, avoiding unnecessary retries and improving debuggability.

Which issue(s) this PR fixes:
Closes #4385

Special notes for your reviewer:
This PR intentionally limits its scope to SetLabel, as described in the linked issue.
A similar validation pattern may also apply to SetAnnotation, but that is left out here to keep the fix minimal and focused.

@github-actions github-actions bot added kind/bug These are bugs. size/S labels Dec 13, 2025
@lacroixthomas
Copy link
Copy Markdown
Collaborator

/gcbrun

Copy link
Copy Markdown
Collaborator

@lacroixthomas lacroixthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rajanarahul93,
Thanks for the contribution !

I've put a minor change, but will need confirmation first

Comment on lines +582 to +603
if kv == nil {
return nil, status.Error(codes.InvalidArgument, "label key/value cannot be nil")
}

if errs := validation.IsQualifiedName(kv.Key); len(errs) > 0 {
return nil, status.Errorf(
codes.InvalidArgument,
"invalid label key %q: %s",
kv.Key,
strings.Join(errs, ", "),
)
}

if errs := validation.IsValidLabelValue(kv.Value); len(errs) > 0 {
return nil, status.Errorf(
codes.InvalidArgument,
"invalid label value %q for key %q: %s",
kv.Value,
kv.Key,
strings.Join(errs, ", "),
)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though that seems totally valid, I wonder if we should not use errors.Errorf instead, to be consistent with the rest of the code ? 🤔 @markmandel

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know - I'm not against it. But it does mean this method could return a 4xx rather than a 5xx like in most (all?) other cases.

I'm almost wondering if we should create a ticket to go back and make our grpc errors proper gRPC errors. The only thing I can think it might break is manual REST implementations that expect a 500 response in some way. WDYT?

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: ef21dadb-64dc-41b3-b0df-79a0ff209b94

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4386/head:pr_4386 && git checkout pr_4386
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.55.0-dev-cf25183

@rajanarahul93
Copy link
Copy Markdown
Contributor Author

Good question, thanks for raising it!

I used status.Errorf(codes.InvalidArgument, ...) intentionally here so that the SDK returns a proper gRPC InvalidArgument error, which is translated by the grpc-gateway into an HTTP 400 response for REST clients.

Using errors.Errorf would result in a generic gRPC error (codes.Unknown), which would map to an HTTP 500 and wouldn’t reflect a client-side validation failure.

Happy to adjust if there’s a preferred pattern for returning gRPC status errors elsewhere in the SDK.

@rajanarahul93 rajanarahul93 force-pushed the fix/sdk-setlabel-validation branch from cf25183 to d8ede0b Compare December 24, 2025 05:17
@lacroixthomas
Copy link
Copy Markdown
Collaborator

Hey ! @rajanarahul93
I don't see any issues to return the status code from gRPC, it would indeed give more feedback for the user, would make things clearer
I was checking the sdks (./sdks/*) and they seems to be all fine as well they already returns the error as expected (from what I see, nothing to change / no breaking changes)

I don't know what you think about it @markmandel but I'm fine with it, LGTM, but if you're also good with it, I'll create some issues to ensure the other functions from the sdkServers also returns gRPC status code to be consistent (and also give more context / proper status code for the user)

/gcbrun

@markmandel
Copy link
Copy Markdown
Collaborator

Hey ! @rajanarahul93
I don't see any issues to return the status code from gRPC, it would indeed give more feedback for the user, would make things clearer
I was checking the sdks (./sdks/*) and they seems to be all fine as well they already returns the error as expected (from what I see, nothing to change / no breaking changes)

That's a fair point. I should go look, but I'm willing to bet that most SDK methods don't even return errors ever? Or if they do, it's very rare. Pretty much everything goes straight into a worker queue. So this shouldn't be a big breaking change really.

(Totally on mobile and haven't looked at the code at all 🙂)

@lacroixthomas
Copy link
Copy Markdown
Collaborator

From what I've seen, they only returns go errors (even for invalidArguments - which is why I think we can create issues to improve the other ones as well), but the SDKs either returns it directly or wrap the errors (depending of the language - most of them don't return any errors indeed), I can only see benefit from it, and in term of breaking changes, I only see improvements of the error handling, for the SetLabels from this PR, it was previously returning no errors at all

(No pressure at all, was only going through some PR reviews 😄)

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 47e2143e-7dd3-4282-9bca-3b8da6456159

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4386/head:pr_4386 && git checkout pr_4386
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.55.0-dev-d8ede0b

@rajanarahul93
Copy link
Copy Markdown
Contributor Author

Thanks a lot for the review and the discussion!

Glad to hear this approach makes sense and improves the user-facing error handling.
I’m happy to keep this PR scoped to SetLabel for now, and it sounds great to track consistency improvements for other SDK server methods separately.

Let me know if you’d like me to do anything else here.

Copy link
Copy Markdown
Collaborator

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, never submitted this pending review.

Comment on lines +582 to +603
if kv == nil {
return nil, status.Error(codes.InvalidArgument, "label key/value cannot be nil")
}

if errs := validation.IsQualifiedName(kv.Key); len(errs) > 0 {
return nil, status.Errorf(
codes.InvalidArgument,
"invalid label key %q: %s",
kv.Key,
strings.Join(errs, ", "),
)
}

if errs := validation.IsValidLabelValue(kv.Value); len(errs) > 0 {
return nil, status.Errorf(
codes.InvalidArgument,
"invalid label value %q for key %q: %s",
kv.Value,
kv.Key,
strings.Join(errs, ", "),
)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know - I'm not against it. But it does mean this method could return a 4xx rather than a 5xx like in most (all?) other cases.

I'm almost wondering if we should create a ticket to go back and make our grpc errors proper gRPC errors. The only thing I can think it might break is manual REST implementations that expect a 500 response in some way. WDYT?

@markmandel markmandel enabled auto-merge (squash) January 8, 2026 06:00
@markmandel
Copy link
Copy Markdown
Collaborator

Let's get this merged! Shall we also do SetAnnotation() at some point? If someone could make an issue for it, that would be ace.

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 0fdbb16c-b47c-4589-8437-791600a3155d

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Copy Markdown
Collaborator

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 067e4ff2-7025-491b-95bd-e749e7077f1f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4386/head:pr_4386 && git checkout pr_4386
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.55.0-dev-77e2729

@markmandel markmandel merged commit 3ed5ee0 into agones-dev:main Jan 8, 2026
5 checks passed
mnthe pushed a commit to mnthe/agones that referenced this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug These are bugs. size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error handling on client SDK feature "setLabel"

4 participants