What version of rules_go are you using?
v0.31.0
What version of gazelle are you using?
v0.24.0
What version of Bazel are you using?
5.1.1
Does this issue reproduce with the latest releases of all the above?
Yes
What operating system and processor architecture are you using?
linux/amd64
Any other potentially useful information about your toolchain?
What did you do?
There was (I assume) a bad deploy that broke dl.google.com yesterday afternoon around 3:45pm (pacific time) that was fixed ~15min later. during that ~15min, all builds on our CI/CD system failed with:
190 | WARNING: Download from https://dl.google.com/go/go1.18.1.linux-amd64.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 502 Bad Gateway
191 | ERROR: An error occurred during the fetch of repository 'go_sdk':
192 | Traceback (most recent call last):
193 | File "/root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/io_bazel_rules_go/go/private/sdk.bzl", line 100, column 16, in _go_download_sdk_impl
194 | _remote_sdk(ctx, [url.format(filename) for url in ctx.attr.urls], ctx.attr.strip_prefix, sha256)
195 | File "/root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/io_bazel_rules_go/go/private/sdk.bzl", line 205, column 21, in _remote_sdk
196 | ctx.download(
197 | Error in download: java.io.IOException: Error downloading [https://dl.google.com/go/go1.18.1.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/fc07cdbdb3ccc5391e01bb1a31f63d3c/external/go_sdk/go_sdk.tar.gz: GET returned 502 Bad Gateway
We happened to be trying to do a deploy around the time and have to keep retrying it until dl.google.com was fixed to finally be able to continue the deploy.
For context, we have this part related to go_sdk in our WORKSPACE file:
GO_VERSION = "1.18.1"
...
go_register_toolchains(version = GO_VERSION)
Here are some ideas/suggestions/feature requests to make dl.google.com no longer a single point of failure:
- Make the go_sdk cache-able by remote cache
We do have a bazel remote cache setup (backed by an s3 bucket) for our CI/CD system. Since we have pinned go version, the download url for go_sdk is fixed, so if rules_go can get that from remote cache instead of the original source that would fix most of the problem.
I assume rules_go might still need the index file containing the checksum, which is dynamic by nature and not cache-able, so that might make this less feasible in reality?
- Add
urls arg to go_register_toolchains
During the dl.google.com outage I tried to see if I can override the download url from rules_go, and found out that there's urls arg for go_download_sdk, but not for go_register_toolchains. If we add urls arg to go_register_toolchains so I can set it to both https://dl.google.com/go/{} and https://go.dev/dl/{}, it might helped during similar outages (I'm not 100% sure whether go.dev was affected by the same outage, when I tried go.dev and found out that it works, dl.google.com also recovered shortly after), or we can run an internal mirror of it.
- Better documentation to
sdks arg of go_download_sdk
Alternatively to 2, the only way to avoid using the index file for checksum I can find of is via the sdks arg of go_download_sdk, but the documentation says "see description" and I don't see any example in the description to show how this string_list_dict is supposed to look like. If we can add an example of it to the documentation, I guess I can also switch to use go_download_sdk in order to pin the mirrors and checksums.
So our WORKSPACE would probably look like this:
GO_VERSION = "1.18.1"
GO_LINUX_AMD64_SHA256 = "..."
GO_DARWIN_AMD64_SHA256 = "..."
GO_DARWIN_ARM64_SHA256 = "..."
...
go_download_sdks(
name = "go_sdk",
version = GO_VERSIONS,
urls = [
"https://dl.google.com/go/{}",
"https://go.dev/dl/{}",
# internal mirror here
],
sdks = ...,
)
go_register_toolchains()
What did you expect to see?
What did you see instead?
What version of rules_go are you using?
v0.31.0What version of gazelle are you using?
v0.24.0What version of Bazel are you using?
5.1.1Does this issue reproduce with the latest releases of all the above?
Yes
What operating system and processor architecture are you using?
linux/amd64
Any other potentially useful information about your toolchain?
What did you do?
There was (I assume) a bad deploy that broke dl.google.com yesterday afternoon around 3:45pm (pacific time) that was fixed ~15min later. during that ~15min, all builds on our CI/CD system failed with:
We happened to be trying to do a deploy around the time and have to keep retrying it until dl.google.com was fixed to finally be able to continue the deploy.
For context, we have this part related to
go_sdkin ourWORKSPACEfile:Here are some ideas/suggestions/feature requests to make dl.google.com no longer a single point of failure:
We do have a bazel remote cache setup (backed by an s3 bucket) for our CI/CD system. Since we have pinned go version, the download url for
go_sdkis fixed, so if rules_go can get that from remote cache instead of the original source that would fix most of the problem.I assume rules_go might still need the index file containing the checksum, which is dynamic by nature and not cache-able, so that might make this less feasible in reality?
urlsarg togo_register_toolchainsDuring the dl.google.com outage I tried to see if I can override the download url from rules_go, and found out that there's
urlsarg forgo_download_sdk, but not forgo_register_toolchains. If we addurlsarg togo_register_toolchainsso I can set it to bothhttps://dl.google.com/go/{}andhttps://go.dev/dl/{}, it might helped during similar outages (I'm not 100% sure whether go.dev was affected by the same outage, when I tried go.dev and found out that it works, dl.google.com also recovered shortly after), or we can run an internal mirror of it.sdksarg ofgo_download_sdkAlternatively to 2, the only way to avoid using the index file for checksum I can find of is via the
sdksarg ofgo_download_sdk, but the documentation says "see description" and I don't see any example in the description to show how this string_list_dict is supposed to look like. If we can add an example of it to the documentation, I guess I can also switch to usego_download_sdkin order to pin the mirrors and checksums.So our WORKSPACE would probably look like this:
What did you expect to see?
What did you see instead?