Skip to content

Conversation

@bazel-io
Copy link
Member

@bazel-io bazel-io commented Nov 6, 2025

The stack traces obtained #25232 (comment) indicate that all gRPC threads are waiting on events when the hang reported in #25232 occurs, with no other threads being active except for the virtual threads blocked on upload futures.

This situation is reminiscent of grpc/grpc-java#8334 (comment) and further experimentation showed that reducing the maximum number of concurrent requests per gRPC connection down to 20 (from 100) resolved the hangs. Reducing the number to 50 made them less likely. Since it is not clear that there is a single number that avoids hangs for all backends while not sacrificing performance with some, this change makes the limit configurable for further experimentation.

RELNOTES: The new --remote_max_concurrency_per_connection can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.

Work towards #25232

Closes #27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f

Commit 51a3f0e

The stack traces obtained bazelbuild#25232 (comment) indicate that all gRPC threads are waiting on events when the hang reported in bazelbuild#25232 occurs, with no other threads being active except for the virtual threads blocked on upload futures.

This situation is reminiscent of grpc/grpc-java#8334 (comment) and further experimentation showed that reducing the maximum number of concurrent requests per gRPC connection down to 20 (from 100) resolved the hangs. Reducing the number to 50 made them less likely. Since it is not clear that there is a single number that avoids hangs for all backends while not sacrificing performance with some, this change makes the limit configurable for further experimentation.

RELNOTES: The new `--remote_max_concurrency_per_connection` can be used to specify the maximum number of concurrent gRPC requests Bazel will issue on a single connection to the server. The default value of 100 matches the previous behavior.

Work towards bazelbuild#25232

Closes bazelbuild#27466.

PiperOrigin-RevId: 828555281
Change-Id: I901cfb13be7f4f0a4ef1d406845d96e88cecd02f
@bazel-io bazel-io added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Nov 6, 2025
@bazel-io bazel-io requested a review from a team as a code owner November 6, 2025 12:17
@bazel-io bazel-io added awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Nov 6, 2025
@bazel-io bazel-io requested a review from tjgq November 6, 2025 12:17
@tjgq tjgq added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 6, 2025
@iancha1992 iancha1992 added this pull request to the merge queue Nov 6, 2025
@iancha1992 iancha1992 removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 6, 2025
Merged via the queue into bazelbuild:release-8.5.0 with commit 37d597a Nov 6, 2025
47 checks passed
rdesgroppes added a commit to DataDog/datadog-agent that referenced this pull request Jan 13, 2026
This upgrade brings several performance improvements and bug fixes:

#### Performance improvements
- remote execution: bazelbuild/bazel#27564
- module extensions: bazelbuild/bazel#27296

#### Reliability improvements
- cache invalidation: bazelbuild/bazel#27417
- configuration: bazelbuild/bazel#27128
- Git repositories: bazelbuild/bazel#27705
- query:
  - bazelbuild/bazel#27560
  - bazelbuild/bazel#27117
- registry mirrors: bazelbuild/bazel#27531

#### Bug fixes
- remote cache: bazelbuild/bazel#27996
- repository handling: bazelbuild/bazel#27995
- repository cache: bazelbuild/bazel#28161
- local execution: bazelbuild/bazel#27994
rdesgroppes added a commit to DataDog/datadog-agent that referenced this pull request Jan 13, 2026
This upgrade brings several performance improvements and bug fixes:

## Performance improvements (8.5.0)
- Remote execution: Add --remote_max_concurrency_per_connection flag to control concurrent gRPC requests (default: 100)
  bazelbuild/bazel#27564
- Module extensions: Support storing/retrieving JSON-like Starlark objects without invalidation, reducing unnecessary rebuilds
  bazelbuild/bazel#27296

## Reliability improvements (8.5.0)
- Cache invalidation: Source directory contents now tracked for proper invalidation
  bazelbuild/bazel#27417
- Configuration: Add ctx.configuration.short_id for identifying configurations
  bazelbuild/bazel#27128
- Git repositories: git_repository now checks out default branch when unspecified
  bazelbuild/bazel#27705
- Query: Add executables() function and fix genquery for external repos
  bazelbuild/bazel#27560
  bazelbuild/bazel#27117
- Registry mirrors: --module_mirrors now supports per-registry mirror specification
  bazelbuild/bazel#27531

## Bug fixes (8.5.1)
- Remote cache: Add option to continue with local execution if remote cache is unavailable
  bazelbuild/bazel#27996
- Repository handling: Fix crash when mixing use_repo_rule and --inject_repository
  bazelbuild/bazel#27995
- Repository cache: Fix permission denied issue with --experimental_repository_cache_hardlinks
  bazelbuild/bazel#28161
- Local execution: Fix incorrect SkyframeLookupResult usage
  bazelbuild/bazel#27994

Both 8.5.0 and 8.5.1 are fully backward compatible with Bazel 8.0.

## Dependency updates
- Upgrade rules_go from 0.57.0 to 0.59.0 for Bazel 8.5+ compatibility
  bazel-contrib/rules_go#4493
- Configure sh_configure extension for rules_shell to auto-detect shell toolchain

## Platform-specific changes
- Windows: Configure hermetic shell via --repo_env=BAZEL_SH which is used by both
  sh_configure (sh_binary/sh_test) and --shell_executable (genrule/run_shell).
  This eliminates dependency on system environment variables.
- Windows: Disable code coverage collection (--nocollect_code_coverage) to avoid
  shell toolchain issues. Coverage requires sh_binary (collect_coverage) which
  needs a hermetic shell toolchain not yet available.
  bazelbuild/rules_shell#4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
rdesgroppes added a commit to DataDog/datadog-agent that referenced this pull request Jan 13, 2026
This upgrade brings several performance improvements and bug fixes:

#### Performance improvements
- remote execution: bazelbuild/bazel#27564
- module extensions: bazelbuild/bazel#27296

#### Reliability improvements
- cache invalidation: bazelbuild/bazel#27417
- configuration: bazelbuild/bazel#27128
- Git repositories: bazelbuild/bazel#27705
- query:
  - bazelbuild/bazel#27560
  - bazelbuild/bazel#27117
- registry mirrors: bazelbuild/bazel#27531

#### Bug fixes
- remote cache: bazelbuild/bazel#27996
- repository handling: bazelbuild/bazel#27995
- repository cache: bazelbuild/bazel#28161
- local execution: bazelbuild/bazel#27994
dd-mergequeue bot pushed a commit to DataDog/datadog-agent that referenced this pull request Jan 15, 2026
### What does this PR do?
Bump `bazel` version from [8.4.2](https://github.com/bazelbuild/bazel/releases/tag/8.4.2) to [8.5.1](https://github.com/bazelbuild/bazel/releases/tag/8.5.1).

To make that happen, we also need to:
- specify which `bash` to use on Windows when evaluating **repository** rules, for compatibility with bazelbuild/bazel#26927:
  ```
  ERROR: /path/to/external/bazel_tools/tools/test/BUILD:23:10: in sh_binary rule @@bazel_tools//tools/test:collect_coverage: 
  Error in fail: No suitable shell toolchain found:
  * if you are running Bazel on Windows, set the BAZEL_SH environment variable to the path of bash.exe
  ```
  (actual job output: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1349246422#L77)
- bump `rules_go` from (implicit) [0.57.0](https://github.com/bazel-contrib/rules_go/releases/tag/v0.57.0) to (explicit) [0.59.0](https://github.com/bazel-contrib/rules_go/releases/tag/v0.59.0) for bazel-contrib/rules_go/pull/4493 to be compatible with bazelbuild/bazel/pull/27296:
  ```
  Error: 'Facts' value has no field or method 'clear'
  ```

### Motivation
1. this upgrade alone brings several improvements and bug fixes, among which:
- bazelbuild/bazel#27117
- bazelbuild/bazel#27296
- bazelbuild/bazel#27417, covers:
  - bazelbuild/bazel#25834
  - bazelbuild/bazel#25863
  - bazelbuild/bazel#25864
  - bazelbuild/bazel#25870
  - bazelbuild/bazel#26698
- bazelbuild/bazel#27531
- bazelbuild/bazel#27560
- bazelbuild/bazel#27564
- bazelbuild/bazel#27705
- bazelbuild/bazel#27995
- bazelbuild/bazel#27996

2. `bazel` 9.0 is due soon, so better off favoring incremental bumps.

### Additional Notes
Leveraging [the latter](bazelbuild/bazel#27996) might allow us to later reconsider whether we'd like to go back to the `--remote_cache` flag (instead of the `--remote_executor` flag that we had to switch to in #44962).

Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants