Skip to content

test/testenv/buildx.go: Use gRPC keepalive to detect dial-stdio issues#976

Merged
cpuguy83 merged 1 commit into
project-dalec:mainfrom
invidian:invidian/testenv-grpcping
Mar 3, 2026
Merged

test/testenv/buildx.go: Use gRPC keepalive to detect dial-stdio issues#976
cpuguy83 merged 1 commit into
project-dalec:mainfrom
invidian:invidian/testenv-grpcping

Conversation

@invidian

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:
This mitigates issues of test suite hanging until it hits timeout with e.g. Docker daemon crashing when dial-stdio silently dies until next write.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Closes #972

Special notes for your reviewer:
Tested with reproducer from #972, I'm not sure it's worth adding an extra, explicit test.

Closes project-dalec#972

Signed-off-by: Mateusz Gozdek <mgozdek@microsoft.com>
Copilot AI review requested due to automatic review settings February 25, 2026 15:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the integration test Buildx test environment to proactively detect dead docker buildx dial-stdio connections (e.g., after a Docker daemon crash/restart) by enabling gRPC keepalive on the BuildKit client connection, reducing the likelihood of tests hanging until the overall timeout.

Changes:

  • Add gRPC keepalive dial parameters when creating the BuildKit client used via buildx dial-stdio.
  • Document why keepalive is needed for dial-stdio reliability in crash/restart scenarios.

Comment thread test/testenv/buildx.go
Comment on lines +102 to +103
Time: 10 * time.Second,
Timeout: 5 * time.Second,

Copilot AI Feb 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc-go only sends client keepalive pings when there are active RPCs unless keepalive.ClientParameters.PermitWithoutStream is set. If the intent here is to force periodic writes even while the connection is otherwise idle, add PermitWithoutStream: true (or update the comment to match the actual behavior).

Suggested change
Time: 10 * time.Second,
Timeout: 5 * time.Second,
Time: 10 * time.Second,
Timeout: 5 * time.Second,
PermitWithoutStream: true,

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not necessary, since we should have a stream opened while the build is happening. At least the issue with daemon restarting is addressed with this parameter being false by default.

@invidian invidian requested a review from cpuguy83 February 25, 2026 15:34

@cpuguy83 cpuguy83 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested this out with a simple client/server setup and it seems to work.

@cpuguy83 cpuguy83 merged commit 5197e1c into project-dalec:main Mar 3, 2026
47 of 48 checks passed
@invidian invidian deleted the invidian/testenv-grpcping branch March 3, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Docker daemon crashing or restarting causes integration tests to hang and timeout

3 participants