binary_distribution: content addressable tarballs#48713
binary_distribution: content addressable tarballs#48713scottwittenburg merged 77 commits intospack:developfrom
Conversation
kwryankrattiger
left a comment
There was a problem hiding this comment.
A couple things from a first pass
fa77018 to
82fa296
Compare
1507bd7 to
8897e94
Compare
There was a problem hiding this comment.
The PR seems to blend well with #45189, so from my point of view we can merge in any order. There are still a few issues though that I'd like to fix here.
First one is that the github-actions-v0.5 bootstrap cache does not work in this branch. To reproduce, from a clean checkout:
$ spack bootstrap disable github-actions-v0.6
==> "github-actions-v0.6" is now disabled and will not be used for bootstrapping
$ spack bootstrap disable spack-install
==> "spack-install" is now disabled and will not be used for bootstrapping
$ spack -d bootstrap now
[ ... ]
RuntimeError: cannot bootstrap any of the patchelf executables from spec "patchelf@0.13.1:0.13 %gcc platform=linux target=x86_64" due to the following failures:
github-actions-v0.5 raised RuntimeError: The binary index is empty
File "/home/culpo/PycharmProjects/spack/lib/spack/spack/bootstrap/core.py", line 421, in ensure_executables_in_path_or_raise
if current_bootstrapper.try_search_path(executables, abstract_spec):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/culpo/PycharmProjects/spack/lib/spack/spack/bootstrap/core.py", line 255, in try_search_path
return self._install_and_test(abstract_spec, bincache_platform, data, test_fn)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/culpo/PycharmProjects/spack/lib/spack/spack/bootstrap/core.py", line 210, in _install_and_test
raise RuntimeError("The binary index is empty")we should either fix it, in a way that continues to work for previous Spack versions, or remove it from the available mirrors in this PR.
The other issue is that:
$ spack bootstrap mirror --binary-packages --dev /tmp/bootstrap-mirror
==> Adding "clingo-bootstrap@spack+python %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "gnupg@2.3: %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "patchelf@0.13.1: %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "gnuconfig" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "py-isort@5 %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "py-mypy@0.900: %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "py-black@:24.1.0 %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "py-flake8@3.8.2: %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding "py-pytest@6.2.4: %gcc platform=linux target=x86_64" and dependencies to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Adding binary packages from "https://github.com/spack/spack-bootstrap-mirrors/releases/download/v0.6/bootstrap-buildcache.tar.gz" to the mirror at /tmp/bootstrap-mirror/bootstrap_cache
==> Error: [Errno 2] No such file or directory: '/tmp/tmpq7gaf1jb/spack-src/v3/specs'is broken here. This is not caught in CI for lack of e2e tests, but we should try to keep it in a working state.
0328a0f to
16de322
Compare
|
@alalazo Thank you for looking through this and providing some useful feedback. Regarding your comments:
I used the index diff resultsSince older spack will only see the You also pointed out another command that was broken: However, I could not reproduce that. Instead I see: command traceMy branch is based on 8677063142, which exhibits the same behavior. Any thoughts as to what could be going on there? EDIT: Nevermind, the error above is the same as #48973. Once I installed a compiler in the clean docker image I am using for testing, I can reproduce the above issue. |
I'll double check, but I agree with you it shouldn't. Those binaries are installed by DAG hash, and that is the same in both index files. |
818ee73 to
cf78b10
Compare
Instead of iterating all mirrors to check for presence of layout.json on every spack command, only warn in the following cases: 1. Once per mirror when using buildcache create/push on the cli: checks for presence of layout.json on the single mirror, and if it is not present, warns that the mirror may be old layout. 2. When a spec is downloaded for installation from a v2 layout mirror 3. When an index is fetched from a v2 layout mirror (1) will only happen once, ever, for a mirror, assuming a spec gets pushed, as that will create the layout.json on the mirror as well. (2) and (3) happen in a memoized function, so will only happen once per mirror url, per action (install spec or fetch index), per spack invocation.
d0aa453 to
4ed4b43
Compare
4ed4b43 to
9d723ae
Compare
|
I just checked the issues in #48713 (review) and they still persist, though they get a little further.
The issue with the bootstrap mirror is that the following command: fails to generate an index for the binary caches that was downloaded. |
|
In general, are commands like: spack buildcache SUBCOMMANDstill supposed to be working on v1 buildcaches? |
|
@alalazo Here's what I see when I try the first problematic command: OutputThe docker image I'm running there came from the following Dockerfile: Dockerfile contents |
|
@scottwittenburg The first failure is indeed my fault - I tried it with Python 3.13, but we don't have that binary in the v0.5 buildcache, with less recent Python versions it works. |
I see, yes, I can reproduce that issue. At the moment support for v2 layout is read-only, so updating the mirror index of a v2 layout mirror, like the bootstrap one we're talking about above, is not expected to work. I'm not sure if it makes more sense to:
|
|
I'm also fine with:
|
The failure with spack bootstrap mirror can be addressed in multiple ways, both within this PR or in a following one. So it's not a blocker.
The failure on github-actions v.05 has been fixed.
|
@zackgalbreath @kwryankrattiger I've been testing protected pipelines and PR pipelines in my gitlab side project using this branch, here are the most recent results: |
|
@haampie I think I've addressed all of your comments/requests, do you mind taking another look when you get a chance? Thanks! |
|
One small ergonomic comment for a following PR. The which is convenient to push to buildcaches I don't necessarily want to register in this instance of Spack. The $ spack buildcache migrate -d ./bootstrap-buildcache
usage: spack buildcache migrate [-hudy] mirror
spack buildcache migrate: error: argument mirror: no mirror named "./bootstrap-buildcache"and that forces people who just want to update a buildcache to use: $ spack mirror add bootstrap-buildcache $PWD/bootstrap-buildcache
$ spack buildcache migrate -d bootstrap-buildcache
$ spack mirror remove bootstrap-buildcacheso three commands instead of a single one. |
|
Another note. A failed migration like: spack buildcache migrate -d bootstrap-buildcachecompletely deletes the existing buildcache. The behavior I'd like is to avoid deletion on non-zero exit of the command. Note that the error above happened spuriously, and I am currently unable to reproduce it. I consider this somewhat minor, given that a warning is printed at the beginning, suggesting a less disruptive workflow, but it's also something we might want to improve in a later PR. Maybe we should give the opportunity to perform out-of-place migrations? |
|
To solve the diff --git a/lib/spack/spack/cmd/bootstrap.py b/lib/spack/spack/cmd/bootstrap.py
index d895956659..d20750fa58 100644
--- a/lib/spack/spack/cmd/bootstrap.py
+++ b/lib/spack/spack/cmd/bootstrap.py
@@ -2,6 +2,7 @@
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
import os
+import pathlib
import shutil
import sys
import tempfile
@@ -28,7 +29,7 @@
# Tarball to be downloaded if binary packages are requested in a local mirror
-BINARY_TARBALL = "https://github.com/spack/spack-bootstrap-mirrors/releases/download/v0.6/bootstrap-buildcache.tar.gz"
+BINARY_TARBALL = "https://github.com/spack/spack-bootstrap-mirrors/releases/download/v0.6/bootstrap-buildcache-v3.tar.gz"
#: Subdirectory where to create the mirror
LOCAL_MIRROR_DIR = "bootstrap_cache"
@@ -410,14 +411,9 @@ def _mirror(args):
stage.create()
stage.fetch()
stage.expand_archive()
- # TODO: Once content addressable tarballs PR (#48713) is merged, we can
- # TODO: merge a PR to spack/spack-bootstrap-mirrors replacing the previous
- # TODO: spack commit SHA, "d36452cf4e70fa1da8b9db43921850872b82ced9", with an
- # TODO: updated SHA. Once done, we can make a new release on the bootstrap
- # TODO: mirrors repo, reflect that new release in BINARY_TARBALL, above, and
- # TODO: replace "build_cache", below, with "buildcache_relative_specs_path()".
- build_cache_dir = os.path.join(stage.source_path, "build_cache")
- shutil.move(build_cache_dir, mirror_dir)
+ stage_dir = pathlib.Path(stage.source_path)
+ for entry in stage_dir.iterdir():
+ shutil.move(str(entry), mirror_dir)
llnl.util.tty.set_msg_enabled(True)
def write_metadata(subdir, metadata):
@@ -442,7 +438,6 @@ def write_metadata(subdir, metadata):
shutil.copy(spack.util.path.canonicalize_path(GNUPG_JSON), abs_directory)
shutil.copy(spack.util.path.canonicalize_path(PATCHELF_JSON), abs_directory)
instructions += cmd.format("local-binaries", rel_directory)
- instructions += " % spack buildcache update-index <final-path>/bootstrap_cache\n"
print(instructions)I migrated the bootstrap buildcache and uploaded a layout v3 tarball too, with an index included. |
|
@alalazo Thanks for creating a new bootstrap tarball, and trying out the I agree that if the migrate command experiences any errors, we should refrain from deleting the existing mirror. Instead we could print a message asking them to fix the errors and try again. It could also be nice to allow an out-of-place migration. Also, I agree the migrate command should accept the same types of mirror arguments accepted by push/create. I'd like to make these changes in a small follow-up PR, as you suggested. |
binary_distribution: content addressable url buildcache Change how binary mirrors are laid out, adopting content addressing for every piece of data spack stores in a binary mirror. Items (e.g. tarballs, specfiles, public keys, indices, etc) are now discoverable via manifest files which give the size, checksum, compression type, etc of the the stored item. The information in the manifest, in turn, is used to find the actual data, which is stored by its content address in the blobs directory. Additionally, signing is now applied to the manifest files, rather than to the spec files themselves.
Introduction
This PR changes how binary mirrors are laid out, and adopts a new approach for updating the layout over time. The new buildcache layout introduces content addressable tarballs, with the consequence that spack no longer knows the final url/path to a compressed tarball or specfile without either having just built it, or else fetched the manifest (
.spec.manifest.json) file from the mirror first.Structure and organization
Below is an illustration of the new layout:
Every piece of actual data in a binary mirror is now a content-addressed "blob", and every blob is referenced by a manifest. For example, the manifest for
gcc-runtime@12.3.0from the depiction above looks like:The actual data for the spec is comprised of a compressed spec file (the data with
mediaType: "application/vnd.spack.spec.v5+json") and a compressed tar archive (the data withmediaType: "application/vnd.spack.install.v1.tar+gzip"). The referenced blobs/files are stored as compressed files withing theblobs/directory, and named for their content hash, which is also present in the manifest.The current layout version is 3, the next number in sequence after the current version. This PR makes the layout version part of the path/url, which will keep entries of different layouts separate, as opposed to the previous approach where mirrors could contain entries of mixed layout, which could not be ascertained without fetching the individual spec files.
Any content blobs can be compressed, the compression type is given in the manifest as
compression, which can currently be either"gzip"or"none". At present, only spec files and installation archives are compressed, while public keys, and any type of index is stored uncompressed.Thanks to @haampie for helping to design the layout described above.
This PR introduces a new abstraction
URLBuildcacheEntry,an abstract base class for managing access to url-style binary cache entries. The current concrete implementation,, which contains all the logic for accessing and managing buildcache entries for layout version 3. The idea here is that as new layouts are defined, we can add new concrete classes to implement them, and have a fairly clean way to understand what layout versions the current spack supports. Deprecating and removing support for older layout version should be simpler, via deleting the associated implementations and updating the factory appropriately.URLBuildcacheEntryV3Backwards compatibility
After this is merged, spack will continue to read from old buildcache layouts as needed (preferentially installing from the new layout whenever possible), but will no longer be able to write to old buildcache layouts. Installation from old buildcaches, including processing of indices for concretization, will continue to work, but if you have old buildcache mirrors configured, spack will attempt to warn you (minimally) if you use those old-layout mirrors. The cases where spack will warn are as follows:
spack buildcache push/create ...to a mirror that is either empty or only contains v2 layout contentsIn cases (1) and (2) spack will only warn once per mirror, per action (install spec or fetch index), per spack invocation. In case (3) a successful push to the mirror will create the
v3/layout.jsonfile spack is looking for, so after that you should not be warned when pushing to that mirror again.To do before merging
spack buildcache migratecommand to perform in-place migration offile://ors3://mirrorsdevelopmirrorsbuildcache pruning(deferring until after merge)more tests to exercise all new functionalityspackbot performs PR binary graduation, which requires changes to work with new layout(deferring until after merge)To do after merging
3on the cli of the publish cron job hereUpdate bootstrap package code as described hereThis was already taken care of in this PRFuture work
If we add a layout version element to spack mirror configs, it would allow newer versions of spack to use older mirror layouts as long as those are supported. This would allow newer spack to continue using old buildcache layouts until those layout versions are no longer supported by spack. Whether or not newer spacks should be allow to create buildcache entries in old formats is still unclear to me, and hopefully the community will have input.