test/testenv/buildx.go: Use gRPC keepalive to detect dial-stdio issues#976
Conversation
Closes project-dalec#972 Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the integration test Buildx test environment to proactively detect dead docker buildx dial-stdio connections (e.g., after a Docker daemon crash/restart) by enabling gRPC keepalive on the BuildKit client connection, reducing the likelihood of tests hanging until the overall timeout.
Changes:
- Add gRPC keepalive dial parameters when creating the BuildKit client used via
buildx dial-stdio. - Document why keepalive is needed for dial-stdio reliability in crash/restart scenarios.
| Time: 10 * time.Second, | ||
| Timeout: 5 * time.Second, |
There was a problem hiding this comment.
grpc-go only sends client keepalive pings when there are active RPCs unless keepalive.ClientParameters.PermitWithoutStream is set. If the intent here is to force periodic writes even while the connection is otherwise idle, add PermitWithoutStream: true (or update the comment to match the actual behavior).
| Time: 10 * time.Second, | |
| Timeout: 5 * time.Second, | |
| Time: 10 * time.Second, | |
| Timeout: 5 * time.Second, | |
| PermitWithoutStream: true, |
There was a problem hiding this comment.
I think this is not necessary, since we should have a stream opened while the build is happening. At least the issue with daemon restarting is addressed with this parameter being false by default.
cpuguy83
left a comment
There was a problem hiding this comment.
LGTM
Tested this out with a simple client/server setup and it seems to work.
What this PR does / why we need it:
This mitigates issues of test suite hanging until it hits timeout with e.g. Docker daemon crashing when dial-stdio silently dies until next write.
Which issue(s) this PR fixes (optional, using
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when the PR gets merged):Closes #972
Special notes for your reviewer:
Tested with reproducer from #972, I'm not sure it's worth adding an extra, explicit test.