Mirror CMake dependencies along with Bazel dependencies#42172
Mirror CMake dependencies along with Bazel dependencies#42172murgatroid99 wants to merge 9 commits into
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request migrates the mirror update script from gsutil to gcloud storage and introduces mirroring for GitHub archives found in build_autogenerated.yaml. The review identifies a critical issue where the existence check using gcloud storage objects list fails to return a non-zero exit code for missing files, suggesting the use of the describe command instead. Additionally, the regex patterns used in the script should be updated to use POSIX-compliant character classes to ensure portability across different operating systems.
| done | ||
|
|
||
| # Collect the github archives to mirror from build_autogenerated.yaml | ||
| grep -o '\s*- https://github.com/.*' build_autogenerated.yaml | sed 's/^\s*- https:\/\///' | while read -r line ; do |
There was a problem hiding this comment.
The use of \s in grep and sed is not POSIX-compliant and may not be supported by all versions (e.g., it will fail on BSD/macOS). Additionally, using .* at the end of the regex is risky as it can capture trailing whitespace or comments from the YAML file, leading to invalid URLs being passed to curl.
It is safer to use [[:space:]] for portability and [^[:space:]]* to ensure only the URL is captured.
| grep -o '\s*- https://github.com/.*' build_autogenerated.yaml | sed 's/^\s*- https:\/\///' | while read -r line ; do | |
| grep -o '[[:space:]]*- https://github.com/[^[:space:]]*' build_autogenerated.yaml | sed 's|^[[:space:]]*- https://||' | while read -r line ; do |
There was a problem hiding this comment.
We don't care about full POSIX compliance. GNU works, and BSD works just fine too:
$ /usr/bin/grep -Eo '\s*- https://github.com/(google|envoyproxy|bufbuild|cncf)/.+\.tar\.gz' build_autogenerated.yaml
- https://github.com/google/cel-spec/archive/refs/tags/v0.25.1.tar.gz
- https://github.com/envoyproxy/data-plane-api/archive/6ef568cf4a67362849911d1d2a546fd9f35db2ff.tar.gz
- https://github.com/bufbuild/protoc-gen-validate/archive/refs/tags/v1.2.1.tar.gz
- https://github.com/cncf/xds/archive/ee656c7534f5d7dc23d44dd611689568f72017a6.tar.gz
$ /usr/bin/grep --version
grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
There was a problem hiding this comment.
Ok, sed doesn't work with \s. Here's what works:
$ /usr/bin/grep -Eo '\s*- https://github.com/.+(.tar.gz|.zip)' build_autogenerated.yaml
| /usr/bin/sed 's@^[[:space:]]*- https://@@'| done | ||
|
|
||
| # Collect the github archives to mirror from build_autogenerated.yaml | ||
| grep -o '\s*- https://github.com/.*' build_autogenerated.yaml | sed 's/^\s*- https:\/\///' | while read -r line ; do |
There was a problem hiding this comment.
| grep -o '\s*- https://github.com/.*' build_autogenerated.yaml | sed 's/^\s*- https:\/\///' | while read -r line ; do | |
| grep -Eo '\s*- https://github.com/.+\.(gz|zip)' build_autogenerated.yaml | sed 's/^\s*- https:\/\///' | while read -r line ; do |
Since the script is not checking the exact property external_proto_libraries:, let's make it a bit safer so we don't download web pages accidentally from any other place where we may have github links in the future.
We also could restrict it even more with an allowlist like so (google|envoyproxy|bufbuild|cncf):
grep -Eo '\s*- https://github.com/(google|envoyproxy|bufbuild|cncf)/.+\.tar\.gz' build_autogenerated.yaml
and add a logic to print out those we skipped - but this is probably fine to start without an allow-list
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the dependency mirroring process by switching to gcloud storage and adding support for CMake dependencies. It also introduces a fallback download mechanism for CMake archives to use a gRPC-hosted mirror. However, the implementation of the fallback logic via an external shell script introduces significant portability concerns for macOS and Windows environments due to dependencies on curl, sha256sum, and bash-specific syntax. It is recommended to refactor this logic to use native CMake file(DOWNLOAD) commands to ensure cross-platform compatibility.
|
|
||
| # Collect the github archives to mirror from build_autogenerated.yaml | ||
| echo "Updating CMake dependency mirrors" | ||
| grep -Eo '\s*- https://github.com/.+\.(tar\.gz|zip)' build_autogenerated.yaml | sed 's@^[[:space:]]*- https://@@' | while read -r line ; do |
There was a problem hiding this comment.
Confirming this works with Mac's sed and grep
$ /usr/bin/grep -Eo '\s*- https://github.com/.+\.(tar\.gz|zip)' build_autogenerated.yaml | /usr/bin/sed 's@^[[:space:]]*- https://@@'
github.com/google/cel-spec/archive/refs/tags/v0.25.1.tar.gz
github.com/envoyproxy/data-plane-api/archive/6ef568cf4a67362849911d1d2a546fd9f35db2ff.tar.gz
github.com/googleapis/googleapis/archive/2193a2bfcecb92b92aad7a4d81baa428cafd7dfd.zip
github.com/bufbuild/protoc-gen-validate/archive/refs/tags/v1.2.1.tar.gz
github.com/cncf/xds/archive/ee656c7534f5d7dc23d44dd611689568f72017a6.tar.gz|
Internal ref b/505233243 |
The only github.com links in `build_autogenerated.yaml` are the ones that correspond to packages that `CMakeLists.txt` downloads, so we mirror those links in GCS the same way we do bazel dependencies. The goal is to use this mirror as an alternative download target in CMake to minimize flakes from having our downloads throttled by GitHub. I also switched from `gsutil` to `gcloud storage` because `gsutil` wasn't working for me and the command line output recommended switching. Closes grpc#42172 COPYBARA_INTEGRATE_REVIEW=grpc#42172 from murgatroid99:cmake_deps_mirror 63468be PiperOrigin-RevId: 903662332
…2513) The `str.removeprefix` method was introduced in Python 3.9. Some CI environments still use older versions of Python, causing template rendering to fail. Replacing it with a slice-based approach for compatibility. We should revert this once we upgrade `grpc-ubuntu20-large` Kokoro base VM image. The issue was introduced in #42172 - but it's a CI tech debt issue. Normally we should be assuming that we run a modern python. Closes #42513 COPYBARA_INTEGRATE_REVIEW=#42513 from sergiitk:fix/ci/cmake-old-python-removeprefix eefdf17 PiperOrigin-RevId: 923156519
…pc#42513) The `str.removeprefix` method was introduced in Python 3.9. Some CI environments still use older versions of Python, causing template rendering to fail. Replacing it with a slice-based approach for compatibility. We should revert this once we upgrade `grpc-ubuntu20-large` Kokoro base VM image. The issue was introduced in grpc#42172 - but it's a CI tech debt issue. Normally we should be assuming that we run a modern python. Closes grpc#42513 COPYBARA_INTEGRATE_REVIEW=grpc#42513 from sergiitk:fix/ci/cmake-old-python-removeprefix eefdf17 PiperOrigin-RevId: 923156519
The only github.com links in `build_autogenerated.yaml` are the ones that correspond to packages that `CMakeLists.txt` downloads, so we mirror those links in GCS the same way we do bazel dependencies. The goal is to use this mirror as an alternative download target in CMake to minimize flakes from having our downloads throttled by GitHub. I also switched from `gsutil` to `gcloud storage` because `gsutil` wasn't working for me and the command line output recommended switching. Closes grpc#42172 COPYBARA_INTEGRATE_REVIEW=grpc#42172 from murgatroid99:cmake_deps_mirror 63468be PiperOrigin-RevId: 903662332
…pc#42513) The `str.removeprefix` method was introduced in Python 3.9. Some CI environments still use older versions of Python, causing template rendering to fail. Replacing it with a slice-based approach for compatibility. We should revert this once we upgrade `grpc-ubuntu20-large` Kokoro base VM image. The issue was introduced in grpc#42172 - but it's a CI tech debt issue. Normally we should be assuming that we run a modern python. Closes grpc#42513 COPYBARA_INTEGRATE_REVIEW=grpc#42513 from sergiitk:fix/ci/cmake-old-python-removeprefix eefdf17 PiperOrigin-RevId: 923156519
The only github.com links in
build_autogenerated.yamlare the ones that correspond to packages thatCMakeLists.txtdownloads, so we mirror those links in GCS the same way we do bazel dependencies. The goal is to use this mirror as an alternative download target in CMake to minimize flakes from having our downloads throttled by GitHub.I also switched from
gsutiltogcloud storagebecausegsutilwasn't working for me and the command line output recommended switching.