Skip to content

fix: broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2#4270

Merged
markmandel merged 7 commits intoagones-dev:mainfrom
swermin:bugfix/fix-broken-websocket
Sep 19, 2025
Merged

fix: broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2#4270
markmandel merged 7 commits intoagones-dev:mainfrom
swermin:bugfix/fix-broken-websocket

Conversation

@swermin
Copy link
Copy Markdown
Contributor

@swermin swermin commented Sep 8, 2025

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:
This PR introduces a change to the tmc/grpc-websocket-proxy to fix a deadlock when trying to establish a websocket connection to gRPC code that was generated by github.com/grpc-ecosystem/grpc-gateway/v2 on version 2.26.2 and above.

github.com/grpc-ecosystem/grpc-gateway/v2 on version 2.26.2 introduced this change PR 5240 that causes request that has empty body specified to drain the body before proceeding with the call. This change was introduced back in April 2025 and the maintainer of the tmc/grpc-websocket-proxy has not been active for a while, making this breaking bug lingering and blocks projects that has websocket support as a requirement to be able to upgrade to newer agones versions.

As a workaround, due to critical functionality is broken, this PR introduces a fix in the vendor folder, so that we can unblock agones upgrades for projects that are depending on the websocket connection to work.

Which issue(s) this PR fixes:

Closes #4248

Special notes for your reviewer:
This is just a fix for the watch gameserver websocket connection, as it only ever sends data from the sidecar to the gameserver executable. That is fine for this current setup. If we need bi-directional support whatsoever from the websocket connection then this change will not work as there is no functionality exposed to send data through the proxy.

@markmandel
Copy link
Copy Markdown
Collaborator

Final thing, while this should be pushed upstream it seems like there doesn't seem to be any traction from the maintainer, and since a lot of people (including me) depend on the websocket watch to work I added this change into the vendor directory for now.

Maybe silly question - but the watch gameserver websocket connection continues to work, no? (because the test checks this).

I thought this was just getting the GameServer through a websocket? Or maybe I misunderstood?

@markmandel
Copy link
Copy Markdown
Collaborator

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 5b59fa3d-211b-4fad-aa67-dcc22a2edfd3

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 9, 2025

Great question! I think I failed describing the issue properly.

The websocket connection is only used to listen on changes that happens on the gameserver CRD.
What happens is that when a connection is being established to the /watch/gameserver it hangs on the gateway side due to the introduction of io.Copy(io.Discard, req.Body).
The websocket proxy sends a io.Pipe() as a body, meaning that the discard effectively hangs until the body closes.

The fix is to send an empty body. Firstly because we use aGET call, which does not expect a body at all, and secondly that will make the io.Copy(io.Discard, req.Body) succeed and continue the execution of the request.
That means that the websocket connection is fully established and the receiving side will start getting the updates on the CRD as they happen.

The ws-watch-test.go doesn’t succeed at all without this change as far as I can tell.
It does four things:

  • establish a websocket connection
  • reserve a gamesever by calling /reserve
  • Use http to get the gameserver crd and check if it is in Reserved state
  • Poll the websocket connection for the CRD changes that shows the reserved message pushed.

The last point never executes because the connection never established, but hangs instead.

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 9, 2025

Also, I don’t get why the gcburn failed 🤔

@swermin swermin force-pushed the bugfix/fix-broken-websocket branch from cb6a724 to aaaa05b Compare September 9, 2025 07:22
@swermin swermin mentioned this pull request Sep 9, 2025
58 tasks
@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 9, 2025

Honestly though, the fix stared me right in the face. We don't need to fake the body, we just need to pass the request body to the proxy and that will work with both post/get calls

@markmandel
Copy link
Copy Markdown
Collaborator

Looks like e2e tests are being extra flaky again.

@swermin swermin force-pushed the bugfix/fix-broken-websocket branch 2 times, most recently from 07b11ee to c3637dd Compare September 10, 2025 16:35
@swermin swermin force-pushed the bugfix/fix-broken-websocket branch from c3637dd to 95b3b13 Compare September 11, 2025 04:58
@swermin swermin changed the title fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2 fix: broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2 Sep 11, 2025
@lacroixthomas
Copy link
Copy Markdown
Collaborator

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 5731ec03-31bc-481e-9a4e-4a511817ccbf

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4270/head:pr_4270 && git checkout pr_4270
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.53.0-dev-95b3b13

@github-actions github-actions bot added the kind/bug These are bugs. label Sep 16, 2025
Copy link
Copy Markdown
Collaborator

@lacroixthomas lacroixthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Nice catch for the sdk Makefile !

I was wondering if we should add a comment on the code (as we updated the vendors directly ? 🤔)

/gcbrun

@lacroixthomas
Copy link
Copy Markdown
Collaborator

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 3e22d54c-b6ca-4b34-a4b8-dba8dd4e0f56

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Copy Markdown
Collaborator

markmandel commented Sep 18, 2025

Flake:

PASS test/e2e.TestFleetGSSpecValidation (9.81s)
=== RUN   TestSuperTuxKartGameServerReady
=== PAUSE TestSuperTuxKartGameServerReady
=== CONT  TestSuperTuxKartGameServerReady
    examples_test.go:83: 
        	Error Trace:	/go/src/agones.dev/agones/test/e2e/examples_test.go:83
        	Error:      	Received unexpected error:
        	            	waiting for {supertuxkart [{default  Dynamic <nil> 8080 0 UDP}] {false 60 0 30}  { 0 0} {{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []} {[] [] [{supertuxkart us-docker.pkg.dev/agones-images/examples/supertuxkart-example:0.19 [] []  [] [] [{ENABLE_PLAYER_TRACKING false nil}] {map[] map[] []} [] <nil> [] [] nil nil nil nil    nil false false false}] []  <nil> <nil>  map[]   <nil>  false false false <nil> nil []   nil  [] []  <nil> nil [] <nil> <nil> <nil> map[] [] <nil> nil <nil> [] [] nil}} <nil> map[] map[] <nil>} GameServer instance readiness timed out (): waiting for GameServer 1758158444/supertuxkart-w7rfd to be Ready: GameServer reached terminal state Unhealthy
        	Test:       	TestSuperTuxKartGameServerReady
--- FAIL: TestSuperTuxKartGameServerReady (26.29s)

https://console.cloud.google.com/cloud-build/builds/9d80a8d5-68b1-4c41-85cf-64b4b2e002ac;step=1?authuser=1&project=agones-images

Blarg, this keeps failing on Autopilot for some reason. I do wonder why.

@markmandel
Copy link
Copy Markdown
Collaborator

Quick gut check before I approve this for merge -

One issue with having this fix in the vendor directory is that any run of go mod vendor at this point will overview the patch.

Does it make sense for someone (@swermin ?) to maintain a fork of github.com/tmc/grpc-websocket-proxy with the fix in it, and we change our go.mod to point at the fork (probably with a replace declaration).

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 18, 2025

I didn’t think of that, that a go mod vendor will revert the patch and then break things.
I don’t mind maintaining a fork with this change. I’ll update the PR tomorrow to reflect that change

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 18, 2025

LGTM ! Nice catch for the sdk Makefile !

I was wondering if we should add a comment on the code (as we updated the vendors directly ? 🤔)

/gcbrun

I did add a comment on the code, the part that needed a fix. Or do you mean on the other lines?

@lacroixthomas
Copy link
Copy Markdown
Collaborator

LGTM ! Nice catch for the sdk Makefile !

I was wondering if we should add a comment on the code (as we updated the vendors directly ? 🤔)

/gcbrun

I did add a comment on the code, the part that needed a fix. Or do you mean on the other lines?

Oh indeed my bad, looks all good, or maybe in the go.mod if it uses a fork or something, otherwise all good

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: a56e6483-9ac1-4511-a221-b63ad616894f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4270/head:pr_4270 && git checkout pr_4270
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.53.0-dev-ac3d356

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 19, 2025

I've updated to use a forked version of the module and used replace to highlight that it is for this hotfix

Copy link
Copy Markdown
Collaborator

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah

@markmandel markmandel enabled auto-merge (squash) September 19, 2025 15:52
@markmandel
Copy link
Copy Markdown
Collaborator

/gcbrun

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 19, 2025

Woho! 🙌

@markmandel
Copy link
Copy Markdown
Collaborator

Very nice work! Lemme know if I can do anything to help you be more active here 😃

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 30fcdeb1-2716-452e-9592-fefda0e7cfb8

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4270/head:pr_4270 && git checkout pr_4270
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.53.0-dev-4c06754

@markmandel markmandel merged commit 409a6fc into agones-dev:main Sep 19, 2025
4 checks passed
@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 19, 2025

Very nice work! Lemme know if I can do anything to help you be more active here 😃

Oh I’m constantly lurking in here, but I will try to be more engaging going forward! 😄

@swermin swermin deleted the bugfix/fix-broken-websocket branch September 19, 2025 17:05
@markmandel
Copy link
Copy Markdown
Collaborator

@swermin I went hunting for you in Slack 😄 could't find you.

@swermin
Copy link
Copy Markdown
Contributor Author

swermin commented Sep 19, 2025

I just messaged you on slack! 🙌

@igooch igooch mentioned this pull request Sep 24, 2025
50 tasks
Sivasankaran25 pushed a commit to Sivasankaran25/agones that referenced this pull request Sep 25, 2025
…ystem/grpc-gateway/v2 (agones-dev#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
Sivasankaran25 pushed a commit to Sivasankaran25/agones that referenced this pull request Sep 25, 2025
…ystem/grpc-gateway/v2 (agones-dev#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
Sivasankaran25 pushed a commit to Sivasankaran25/agones that referenced this pull request Sep 25, 2025
…ystem/grpc-gateway/v2 (agones-dev#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
Sivasankaran25 pushed a commit to Sivasankaran25/agones that referenced this pull request Sep 26, 2025
…ystem/grpc-gateway/v2 (agones-dev#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
igooch pushed a commit that referenced this pull request Sep 26, 2025
* patch-releasev1.52.1

* fix: broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2 (#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>

* update few changes

* update some dependencies files

* update config.toml files

---------

Co-authored-by: Ermin Hrkalovic <ermin.hrkalovic@guerrilla-games.com>
Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
@igooch igooch mentioned this pull request Oct 1, 2025
61 tasks
mnthe pushed a commit to mnthe/agones that referenced this pull request Mar 23, 2026
…ystem/grpc-gateway/v2 (agones-dev#4270)

* fix broken websocket connection after upgrading github.com/grpc-ecosystem/grpc-gateway/v2

* add comment explaining the issue at hand

* use the request body instead of an empty one

* make sure conformance tests errors are captured

* use go mod replacement for websocket hotfix

---------

Co-authored-by: Thomas Lacroix <thomas.lacroix@ubisoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling WatchGameServer via websocket stopped working in v1.50

4 participants