Skip to content

Conversation

@fmeum
Copy link
Collaborator

@fmeum fmeum commented Oct 30, 2025

The stack traces obtained #25232 (comment) indicate that all gRPC threads are waiting on events when the hang reported in #25232 occurs, with no other threads being active except for the virtual threads blocked on upload futures.

This situation is reminiscent of grpc/grpc-java#8334 (comment) and further experimentation showed that reducing the maximum number of concurrent requests per gRPC connection down to 20 (from 100) resolved the hangs. Reducing the number to 50 made them less likely. Since it is not clear that there is a single number that avoids hangs for all backends while not sacrificing performance with some, this change makes the limit configurable for further experimentation.

RELNOTES: The new --remote_max_concurrency_per_connection can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.

Work towards #25232

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.
@fmeum fmeum requested a review from a team as a code owner October 30, 2025 15:57
@fmeum fmeum requested review from coeuvre and tjgq October 30, 2025 15:57
@github-actions github-actions bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Oct 30, 2025
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 30, 2025

@bazel-io flag 8.5.0

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Oct 30, 2025
Copy link
Contributor

@tjgq tjgq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the discussion in #25232 doesn't clarify why this helps, can you summarize your findings in this PR's description, so we can preserve an understanding of why we added this flag?

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 30, 2025

Since the discussion in #25232 doesn't clarify why this helps, can you summarize your findings in this PR's description, so we can preserve an understanding of why we added this flag?

Good point, done! Let me know if you would prefer more details.

@fmeum fmeum requested a review from tjgq October 30, 2025 19:05
@iancha1992 iancha1992 removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Oct 30, 2025
@tjgq tjgq added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 4, 2025
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 4, 2025

@bazel-io fork 9.0.0

@copybara-service copybara-service bot closed this in 51a3f0e Nov 5, 2025
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 5, 2025
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Nov 5, 2025
The stack traces obtained bazelbuild#25232 (comment) indicate that all gRPC threads are waiting on events when the hang reported in bazelbuild#25232 occurs, with no other threads being active except for the virtual threads blocked on upload futures.

This situation is reminiscent of grpc/grpc-java#8334 (comment) and further experimentation showed that reducing the maximum number of concurrent requests per gRPC connection down to 20 (from 100) resolved the hangs. Reducing the number to 50 made them less likely. Since it is not clear that there is a single number that avoids hangs for all backends while not sacrificing performance with some, this change makes the limit configurable for further experimentation.

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.

Work towards bazelbuild#25232

Closes bazelbuild#27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f
@fmeum fmeum deleted the 25232-concurrency-flag branch November 5, 2025 19:59
github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2025
The stack traces obtained
#25232 (comment)
indicate that all gRPC threads are waiting on events when the hang
reported in #25232 occurs, with no other threads being active except for
the virtual threads blocked on upload futures.

This situation is reminiscent of
grpc/grpc-java#8334 (comment) and
further experimentation showed that reducing the maximum number of
concurrent requests per gRPC connection down to 20 (from 100) resolved
the hangs. Reducing the number to 50 made them less likely. Since it is
not clear that there is a single number that avoids hangs for all
backends while not sacrificing performance with some, this change makes
the limit configurable for further experimentation.

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used
to specify the maximum number of concurrent gRPC requests Bazel will
issue on a single connection to the server. The default value of 100
matches the previous behavior.

Work towards #25232

Closes #27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f

Commit
51a3f0e

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 6, 2025

@bazel-io fork 8.5.0

bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Nov 6, 2025
The stack traces obtained bazelbuild#25232 (comment) indicate that all gRPC threads are waiting on events when the hang reported in bazelbuild#25232 occurs, with no other threads being active except for the virtual threads blocked on upload futures.

This situation is reminiscent of grpc/grpc-java#8334 (comment) and further experimentation showed that reducing the maximum number of concurrent requests per gRPC connection down to 20 (from 100) resolved the hangs. Reducing the number to 50 made them less likely. Since it is not clear that there is a single number that avoids hangs for all backends while not sacrificing performance with some, this change makes the limit configurable for further experimentation.

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.

Work towards bazelbuild#25232

Closes bazelbuild#27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f
github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2025
The stack traces obtained
#25232 (comment)
indicate that all gRPC threads are waiting on events when the hang
reported in #25232 occurs, with no other threads being active except for
the virtual threads blocked on upload futures.

This situation is reminiscent of
grpc/grpc-java#8334 (comment) and
further experimentation showed that reducing the maximum number of
concurrent requests per gRPC connection down to 20 (from 100) resolved
the hangs. Reducing the number to 50 made them less likely. Since it is
not clear that there is a single number that avoids hangs for all
backends while not sacrificing performance with some, this change makes
the limit configurable for further experimentation.

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used
to specify the maximum number of concurrent gRPC requests Bazel will
issue on a single connection to the server. The default value of 100
matches the previous behavior.

Work towards #25232

Closes #27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f

Commit
51a3f0e

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants