Skip to content

cleanup(e2e): Scale back autoscaler timeout.#4312

Merged
lacroixthomas merged 2 commits intoagones-dev:mainfrom
markmandel:flaky/TestAllocatorAfterDeleteReplica
Oct 27, 2025
Merged

cleanup(e2e): Scale back autoscaler timeout.#4312
lacroixthomas merged 2 commits intoagones-dev:mainfrom
markmandel:flaky/TestAllocatorAfterDeleteReplica

Conversation

@markmandel
Copy link
Copy Markdown
Collaborator

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

Since a unit test in Go cannot pass 10m (there seems to be no setting for this!) the timeout of 10m is not useful for Autopilot clusters since it will cause a timeout panic if anything times out by 10m, which also stops the e2e test runner from rerunning the tests.

So some minor cleanup on TestAllocatorAfterDeleteReplica since that tends to fail the most, and updated timeout for Autopilot clusters.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:
N/A

Since a unit test in Go cannot pass 10m (there seems to be no setting
for this!) the timeout of 10m is not useful for Autopilot clusters
since it will cause a timeout panic if anything times out by 10m,
which also stops the e2e test runner from rerunning the tests.

So some minor cleanup on TestAllocatorAfterDeleteReplica since that
tends to fail the most, and updated timeout for Autopilot clusters.
@markmandel markmandel added the area/tests Unit tests, e2e tests, anything to make sure things don't break label Oct 23, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reduces the autoscaler timeout for Autopilot clusters from 10 minutes to 8 minutes to prevent test timeouts and stack trace dumps, since Go unit tests cannot exceed 10 minutes. The change also refactors error handling in the TestAllocatorAfterDeleteReplica test to use require.NoError instead of assert.Nil.

  • Reduced WaitForState timeout for GKE Autopilot from 10 minutes to 8 minutes
  • Improved error handling in TestAllocatorAfterDeleteReplica test

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
test/e2e/framework/framework.go Updated Autopilot cluster timeout from 10m to 8m with expanded comment explaining the constraint
test/e2e/allocator/pod_termination_test.go Replaced assert.Nil with require.NoError for fleet creation error handling

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions github-actions bot added kind/cleanup Refactoring code, fixing up documentation, etc size/XS labels Oct 23, 2025
@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: ecdc309c-5d91-4c24-9c96-7c07f9ebb3f9

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4312/head:pr_4312 && git checkout pr_4312
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.53.0-dev-630097c

@lacroixthomas
Copy link
Copy Markdown
Collaborator

LGTM

@lacroixthomas lacroixthomas enabled auto-merge (squash) October 24, 2025 21:47
@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 4612b775-9717-4a04-a8bb-771a08fcb7f3

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Copy Markdown
Collaborator Author

/gcbrun

@markmandel
Copy link
Copy Markdown
Collaborator Author

This is flaky too, but that's a different issue.

VERBOSE: time="2025-10-24 23:38:46.532" level=info msg="Event!" fleet=simple-fleet-1.04h5gt gs=simple-fleet-1.04h5gt-ncs9v-xm5dr lastTimestamp="2025-10-24 23:06:24 +0000 UTC" message="Pod simple-fleet-1.04h5gt-ncs9v-xm5dr created" reason=Creating test="TestCounterAutoscaler/Cannot_scale_up_(MaxCapacity)" type=Normal
VERBOSE:     framework.go:349: 
VERBOSE:         	Error Trace:	/go/src/agones.dev/agones/test/e2e/framework/framework.go:349
VERBOSE:         	            				/go/src/agones.dev/agones/test/e2e/fleetautoscaler_test.go:1042
VERBOSE:         	Error:      	Received unexpected error:
VERBOSE:         	            	context deadline exceeded
VERBOSE:         	Test:       	TestCounterAutoscaler/Cannot_scale_up_(MaxCapacity)
VERBOSE:         	Messages:   	error waiting for fleet condition on fleet: simple-fleet-1.04h5gt
VERBOSE: --- FAIL: TestCounterAutoscaler/Cannot_scale_up_(MaxCapacity) (486.17s)
VERBOSE: === RUN   TestCounterAutoscaler/Cannot_scale_down_(MinCapacity)
VERBOSE:     fleetautoscaler_test.go:1040: 
VERBOSE:         	Error Trace:	/go/src/agones.dev/agones/test/e2e/fleetautoscaler_test.go:1040
VERBOSE:         	Error:      	Received unexpected error:
VERBOSE:         	            	fleetautoscalers.autoscaling.agones.dev "simple-fleet-1.04h5gt-counter-autoscaler" already exists
VERBOSE:         	Test:       	TestCounterAutoscaler/Cannot_scale_down_(MinCapacity)
VERBOSE: time="2025-10-24 23:38:46.748" level=info msg="waiting for fleet condition" flee

https://console.cloud.google.com/cloud-build/builds/52b351db-5ca7-4ad8-bbd8-be51fd0dbc77;step=2?project=agones-images

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: e3d82d65-108f-4dd1-8dfa-e5ec7f7b1fe6

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4312/head:pr_4312 && git checkout pr_4312
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.54.0-dev-2bb13af

@lacroixthomas lacroixthomas merged commit d1040b6 into agones-dev:main Oct 27, 2025
4 checks passed
@markmandel markmandel deleted the flaky/TestAllocatorAfterDeleteReplica branch October 27, 2025 22:16
mnthe pushed a commit to mnthe/agones that referenced this pull request Mar 23, 2026
Since a unit test in Go cannot pass 10m (there seems to be no setting
for this!) the timeout of 10m is not useful for Autopilot clusters
since it will cause a timeout panic if anything times out by 10m,
which also stops the e2e test runner from rerunning the tests.

So some minor cleanup on TestAllocatorAfterDeleteReplica since that
tends to fail the most, and updated timeout for Autopilot clusters.

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tests Unit tests, e2e tests, anything to make sure things don't break kind/cleanup Refactoring code, fixing up documentation, etc size/XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants