Skip to content

test: standardize runtime.ParseFeatures error handling and mutex usage#4416

Merged
markmandel merged 4 commits intoagones-dev:mainfrom
markmandel:flaky/TestControllerAllocator
Jan 21, 2026
Merged

test: standardize runtime.ParseFeatures error handling and mutex usage#4416
markmandel merged 4 commits intoagones-dev:mainfrom
markmandel:flaky/TestControllerAllocator

Conversation

@markmandel
Copy link
Copy Markdown
Collaborator

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

Ensure all calls to runtime.ParseFeatures in unit tests follow consistent patterns:

  • Wrap all calls with require.NoError instead of assert.NoError to fail fast on parse errors

  • Add runtime.FeatureTestMutex locks with defer unlocks to all tests/subtests using ParseFeatures

  • Fix mutex scope in pkg/sdkserver/sdk_test.go by moving locks from parent test to individual subtests

This ensures thread-safe feature flag manipulation and prevents test flakiness from concurrent feature state modifications.

Which issue(s) this PR fixes:

This also includes some (probable) fixes for flaky tests, particularly closes #4407 as part of this work.

Special notes for your reviewer:

🤞🏻

@markmandel markmandel added the area/tests Unit tests, e2e tests, anything to make sure things don't break label Jan 13, 2026
@github-actions github-actions bot added kind/cleanup Refactoring code, fixing up documentation, etc size/M labels Jan 13, 2026
@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 636ddd9a-cf4f-4b78-8625-efa54c88bf21

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4416/head:pr_4416 && git checkout pr_4416
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.55.0-dev-f7f6c87

Comment on lines +624 to +625
defer func() {
_ = utilruntime.ParseFeatures("")
require.NoError(t, utilruntime.ParseFeatures(""))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? None of the other tests use a cleanup function for the ParseFeatures.

If we do need the cleanup function, should this be assert instead of require? The require inside of a defer cleanup function might mask a primary failure elsewhere.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - it was there when I was refactoring. Happy to remove.

Comment on lines +63 to +65
runtime.FeatureTestMutex.Lock()
defer runtime.FeatureTestMutex.Unlock()

Copy link
Copy Markdown
Collaborator

@igooch igooch Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex forces this test to run serially as it waits for the global lock, even though there's t.Parallel, correct?

As an alternative, should we remove t.Parallel from all tests that change the state with ParseFeatures or otherwise, and run all other tests that do not need to change the state with t.Parallel? That way all the tests that need to lock run serially, and all remaining tests are run in parallel with and only with other Parallel tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of doing it this way is that other non-mutex'd tests can run in parallel with this one - so at least we can get some speed benefits of those running in parallel together.

This also gives us some testing of changes across unit tests of different feature gates -- but it can lead to flaky tests -- but it can find some potential issues around the way.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to see what the speed difference would be across all e2e tests if we made the change. Not needed for this PR though :)

Comment on lines +119 to +121
if !assert.Equal(t, http.StatusCreated, rec.Code, "Unexpected HTTP Code") {
require.Failf(t, "Unexpected HTTP Code Body: %s", rec.Body.String())
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
if !assert.Equal(t, http.StatusCreated, rec.Code, "Unexpected HTTP Code") {
require.Failf(t, "Unexpected HTTP Code Body: %s", rec.Body.String())
}
require.Equal(t, http.StatusCreated, rec.Code, "Response body: %s", rec.Body.String())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah - that makes way more sense. 🤦🏻

Ensure all calls to runtime.ParseFeatures in unit tests follow
consistent patterns:

- Wrap all calls with require.NoError instead of assert.NoError to fail
fast on parse errors

- Add runtime.FeatureTestMutex locks with defer unlocks to all
tests/subtests using ParseFeatures

- Fix mutex scope in pkg/sdkserver/sdk_test.go by moving locks from
parent test to individual subtests

This ensures thread-safe feature flag manipulation and prevents test
flakiness from concurrent feature state modifications.

This also includes some (probable) fixes for flaky tests, particularly
closes agones-dev#4407 as part of this work.
@markmandel markmandel force-pushed the flaky/TestControllerAllocator branch from f7f6c87 to e9d348b Compare January 19, 2026 05:40
@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 78265f6b-4e3b-4479-9f46-8bfe03ffef77

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4416/head:pr_4416 && git checkout pr_4416
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.56.0-dev-e9d348b

Copy link
Copy Markdown
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markmandel markmandel enabled auto-merge (squash) January 21, 2026 13:29
@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 04b5cae1-eba4-4175-a142-eb8e8a60f986

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Copy Markdown
Collaborator Author

On mobile atm, but there's a link that's not working anymore.

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: b5a72048-4301-4599-a652-4ab794d2e9bf

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4416/head:pr_4416 && git checkout pr_4416
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.56.0-dev-7fc3946

@markmandel markmandel merged commit 397e363 into agones-dev:main Jan 21, 2026
5 checks passed
@markmandel markmandel deleted the flaky/TestControllerAllocator branch January 21, 2026 17:54
mnthe pushed a commit to mnthe/agones that referenced this pull request Mar 23, 2026
agones-dev#4416)

* test: standardize runtime.ParseFeatures error handling and mutex usage

Ensure all calls to runtime.ParseFeatures in unit tests follow
consistent patterns:

- Wrap all calls with require.NoError instead of assert.NoError to fail
fast on parse errors

- Add runtime.FeatureTestMutex locks with defer unlocks to all
tests/subtests using ParseFeatures

- Fix mutex scope in pkg/sdkserver/sdk_test.go by moving locks from
parent test to individual subtests

This ensures thread-safe feature flag manipulation and prevents test
flakiness from concurrent feature state modifications.

This also includes some (probable) fixes for flaky tests, particularly
closes agones-dev#4407 as part of this work.

* Review updates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tests Unit tests, e2e tests, anything to make sure things don't break kind/cleanup Refactoring code, fixing up documentation, etc size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky: TestControllerAllocator/successful_allocation

3 participants