Skip to content

Un-relocatable package in DAG prevents binary cache creation for other packages in DAG #11620

@mpbelhorn

Description

@mpbelhorn

In the OLCF's software management workflow, we would like to generate binary caches for packages built in a staging/testing/preview environment that can be used to rapidly deploy packages in another "production" environment.

In our fork of spack (from v0.12.1), spack generates build caches for each package in the DAG sequentially. It will generate a build cache for each dependency until it encounters the first package which is not relocatable. At that point, spack buildcache create will exit; the remaining packages in the DAG are not processed. This leaves any relocatable but unprocessed packages without a build cache unless created explicitly for each relocatable dependency.

We would like spack to generate as many build caches for dependencies as possible and allow them to be used to satisfy dependencies in later re-use by other spack instances that would only need to build from source packages that are not relocatable.

In the reproducer example that follows, the first dependency spack processes in the DAG in non-relocatable causing no binary caches to be produced at all. However, we've seen cases where two or three of the first dependencies processed are relocatable and spack successfully produces binary caches for them before encountering a non-relocatable dependency and exiting before processing all the dependencies. This leads me to believe it's not important that all the dependencies in a DAG be cache-able for the relocatable ones to still be usefully cached. Is this belief correct? And if so, can we have spack generate as many caches as possible for a given input spec?

Steps to reproduce the issue

$ spack spec -lINt netcdf@4.6.1%gcc@6.4.0     
Input spec
--------------------------------
 -   [    ]  netcdf@4.6.1%gcc@6.4.0

Concretized
--------------------------------
[+]  gzpquhc  [    ]  builtin.netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
[+]  6rzaif5  [bl  ]      ^builtin.hdf5@1.10.3%gcc@6.4.0+cxx~debug+fortran+hl+mpi+pic+shared~szip~threadsafe arch=linux-rhel7-ppc64le 
[+]  acsucgh  [bl  ]          ^builtin.numactl@2.0.11%gcc@6.4.0 patches=592f30f7f5f757dfc239ad0ffd39a9a048487ad803c26b419e0f96b8cda08c1a arch=linux-rhel7-ppc64le 
[+]  4um5hjo  [bl  ]          ^olcf.spectrum-mpi@10.3.0.0-20190419%gcc@6.4.0 arch=linux-rhel7-ppc64le 
[+]  fvgnqf6  [bl  ]          ^builtin.zlib@1.2.11%gcc@6.4.0+optimize+pic+shared arch=linux-rhel7-ppc64le 
[+]  sbessrn  [b   ]      ^builtin.m4@1.4.18%gcc@6.4.0 patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,c0a408fbffb7255fcc75e26bd8edab116fc81d216bfd18b473668b7739a4158e,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 +sigsegv arch=linux-rhel7-ppc64le 
[+]  hdr43hr  [bl  ]      ^builtin.parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le 
spack buildcache create \
    -d ${build_cache_dir} \
    -k "${signing_key}" \
    /gzpquhc
==> Found at least one matching spec
==> examining match netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> adding matching spec netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> recursing dependencies
==> skipping external or virtual dependency numactl@2.0.11%gcc@6.4.0 patches=592f30f7f5f757dfc239ad0ffd39a9a048487ad803c26b419e0f96b8cda08c1a arch=linux-rhel7-ppc64le 
==> adding dependency spectrum-mpi@10.3.0.0-20190419%gcc@6.4.0 arch=linux-rhel7-ppc64le 
==> adding dependency zlib@1.2.11%gcc@6.4.0+optimize+pic+shared arch=linux-rhel7-ppc64le 
==> adding dependency hdf5@1.10.3%gcc@6.4.0+cxx~debug+fortran+hl+mpi+pic+shared~szip~threadsafe arch=linux-rhel7-ppc64le 
==> adding dependency parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le 
==> adding dependency netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> writing tarballs to ./build_cache
==> creating binary cache file for package parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le  
==> Error: 
 /tmp/tmpAtrpKQ/parallel-netcdf-1.8.1-hdr43hrl4opcz2yagwqk4k5zjxmg2bep/bin/pnetcdf_version 
contains string
 /autofs/nccs-svm1_sw/.b2/.swci/1-compute/opt/spack/20180914 
after replacing it in rpaths.
Package should not be relocated.
 Use -a to override.
ls -l "${build_cache_dir}"
total 4
drwxrwsr-x 3 <REDACTED> <REDACTED> 4096 Jun  4 12:35 linux-rhel7-ppc64le

No binary caches are produced for any of the packages, even the ones that are relocatable.

Expected result

The hdf5 and parallel-netcdf dependencies are the only packages which are not relocatable and numactl here is an external package. So we expect binary caches to be produced for everything else. What follows is the output of a run using the proposed fix patch that is described further below.

spack buildcache create /gzpquhc
==> Found at least one matching spec
==> examining match netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> adding matching spec netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> recursing dependencies
==> skipping external or virtual dependency numactl@2.0.11%gcc@6.4.0 patches=592f30f7f5f757dfc239ad0ffd39a9a048487ad803c26b419e0f96b8cda08c1a arch=linux-rhel7-ppc64le 
==> adding dependency spectrum-mpi@10.3.0.0-20190419%gcc@6.4.0 arch=linux-rhel7-ppc64le 
==> adding dependency zlib@1.2.11%gcc@6.4.0+optimize+pic+shared arch=linux-rhel7-ppc64le 
==> adding dependency hdf5@1.10.3%gcc@6.4.0+cxx~debug+fortran+hl+mpi+pic+shared~szip~threadsafe arch=linux-rhel7-ppc64le 
==> adding dependency parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le 
==> adding dependency netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
==> writing tarballs to ./build_cache
==> creating binary cache file for package parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le  
==> Warning: 
 /tmp/tmpd6qqWI/parallel-netcdf-1.8.1-hdr43hrl4opcz2yagwqk4k5zjxmg2bep/bin/pnetcdf_version 
contains string
 /autofs/nccs-svm1_sw/.b2/.swci/1-compute/opt/spack/20180914 
after replacing it in rpaths.
Package should not be relocated.
 Use -a to override.
==> creating binary cache file for package hdf5@1.10.3%gcc@6.4.0+cxx~debug+fortran+hl+mpi+pic+shared~szip~threadsafe arch=linux-rhel7-ppc64le  
==> Warning: 
 /tmp/tmpYecsFk/hdf5-1.10.3-6rzaif5azberzazrue4ryftlk5g4vcp4/lib/libhdf5.so.103.0.0 
contains string
 /autofs/nccs-svm1_sw/.b2/.swci/1-compute/opt/spack/20180914 
after replacing it in rpaths.
Package should not be relocated.
 Use -a to override.
==> creating binary cache file for package netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le  
gpg: using "<REDACTED>" as default secret key for signing
==> creating binary cache file for package spectrum-mpi@10.3.0.0-20190419%gcc@6.4.0 arch=linux-rhel7-ppc64le  
gpg: using "<REDACTED>" as default secret key for signing
==> creating binary cache file for package zlib@1.2.11%gcc@6.4.0+optimize+pic+shared arch=linux-rhel7-ppc64le  
gpg: using "<REDACTED>" as default secret key for signing
ls -l "${build_cache_dir}"
total 20K
drwxrwsr-x 3 <REDACTED> <REDACTED> 4.0K Jun  4 12:35 linux-rhel7-ppc64le/
-rw-rw-r-- 1 <REDACTED> <REDACTED>  754 Jun  4 12:44 index.html
-rw-rw-r-- 1 <REDACTED> <REDACTED> 3.8K Jun  4 12:44 linux-rhel7-ppc64le-gcc-6.4.0-netcdf-4.6.1-gzpquhcgd7zvrohl4f7l4c5dg7ysgrlq.spec.yaml
-rw-rw-r-- 1 <REDACTED> <REDACTED>  636 Jun  4 12:44 linux-rhel7-ppc64le-gcc-6.4.0-spectrum-mpi-10.3.0.0-20190419-4um5hjogm3tepg4xe23hrptlrs2y7ez6.spec.yaml
-rw-rw-r-- 1 <REDACTED> <REDACTED>  657 Jun  4 12:44 linux-rhel7-ppc64le-gcc-6.4.0-zlib-1.2.11-fvgnqf6k3ffhltldndu7pmntzvoyfsk4.spec.yaml

Error Message

The unexpected behavior appears to be in spack.binary_distribution::build_tarball as called by spack.cmd.buildcache::createtarball:

$ spack --stacktrace buildcache create /gzpquhc                                                                                                                                                                                                                    
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:299 ==> Found at least one matching spec
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:302 ==> examining match netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:307 ==> adding matching spec netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:309 ==> recursing dependencies
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:315 ==> skipping external or virtual dependency numactl@2.0.11%gcc@6.4.0 patches=592f30f7f5f757dfc239ad0ffd39a9a048487ad803c26b419e0f96b8cda08c1a arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:317 ==> adding dependency spectrum-mpi@10.3.0.0-20190419%gcc@6.4.0 arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:317 ==> adding dependency zlib@1.2.11%gcc@6.4.0+optimize+pic+shared arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:317 ==> adding dependency hdf5@1.10.3%gcc@6.4.0+cxx~debug+fortran+hl+mpi+pic+shared~szip~threadsafe arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:317 ==> adding dependency parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:317 ==> adding dependency netcdf@4.6.1%gcc@6.4.0~dap~hdf4 maxdims=1024 maxvars=8192 +mpi+parallel-netcdf+pic+shared arch=linux-rhel7-ppc64le 
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:320 ==> writing tarballs to ./build_cache
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/cmd/buildcache.py:323 ==> creating binary cache file for package parallel-netcdf@1.8.1%gcc@6.4.0+cxx+fortran+pic arch=linux-rhel7-ppc64le  
/autofs/nccs-svm1_sw/.b2/.swci/1-compute/lib/spack/spack/binary_distribution.py:337 ==> Error: 
 /tmp/tmpOLsaPk/parallel-netcdf-1.8.1-hdr43hrl4opcz2yagwqk4k5zjxmg2bep/bin/pnetcdf_version 
contains string
 /autofs/nccs-svm1_sw/.b2/.swci/1-compute/opt/spack/20180914 
after replacing it in rpaths.
Package should not be relocated.
 Use -a to override.

In particular, this block calls tty.die on the first general exception rather than raising a more generic exception that could be caught in the cmd.buildcache::createtarball function's loop over dependency specs.

Proposed solution

Allow spack.binary_distribution::build_tarball to log any errors and raise the exceptions. Any NoOverwriteException thrown earlier simply be ignored by cmd.buildcache::createtarball to continue processing dependency specs. Any InstallRootStringException exception thrown by non-relocatable packages can also be ignored to continue processing dependency specs.

diff --git a/lib/spack/spack/binary_distribution.py b/lib/spack/spack/binary_distribution.py
index 46ac7790e..09967b180 100644
--- a/lib/spack/spack/binary_distribution.py
+++ b/lib/spack/spack/binary_distribution.py
@@ -321,20 +321,15 @@ def build_tarball(spec, outdir, force=False, rel=False, unsigned=False,

     # optinally make the paths in the binaries relative to each other
     # in the spack install tree before creating tarball
-    if rel:
-        try:
+    try:
+        if rel:
             make_package_relative(workdir, spec.prefix, allow_root)
-        except Exception as e:
-            shutil.rmtree(workdir)
-            shutil.rmtree(tarfile_dir)
-            tty.die(str(e))
-    else:
-        try:
+        else:
             make_package_placeholder(workdir, spec.prefix, allow_root)
-        except Exception as e:
-            shutil.rmtree(workdir)
-            shutil.rmtree(tarfile_dir)
-            tty.die(str(e))
+    except Exception as e:
+        shutil.rmtree(workdir)
+        shutil.rmtree(tarfile_dir)
+        raise e
     # create compressed tarball of the install prefix
     with closing(tarfile.open(tarfile_path, 'w:gz')) as tar:
         tar.add(name='%s' % workdir,
diff --git a/lib/spack/spack/cmd/buildcache.py b/lib/spack/spack/cmd/buildcache.py
index fe91312c4..5c35fc9f2 100644
--- a/lib/spack/spack/cmd/buildcache.py
+++ b/lib/spack/spack/cmd/buildcache.py
@@ -20,6 +20,8 @@ from spack.spec import Spec, save_dependency_spec_yamls
 from spack.spec_set import CombinatorialSpecSet

 import spack.binary_distribution as bindist
+from spack.binary_distribution import NoOverwriteException
+from spack.relocate import InstallRootStringException
 import spack.cmd.common.arguments as arguments
 from spack.cmd import display_specs

@@ -321,9 +323,14 @@ def createtarball(args):

     for spec in specs:
         tty.msg('creating binary cache file for package %s ' % spec.format())
-        bindist.build_tarball(spec, outdir, args.force, args.rel,
-                              args.unsigned, args.allow_root, signkey,
-                              not args.no_rebuild_index)
+        try:
+            bindist.build_tarball(spec, outdir, args.force, args.rel,
+                                  args.unsigned, args.allow_root, signkey,
+                                  not args.no_rebuild_index)
+        except (NoOverwriteException, InstallRootStringException) as _err:
+            tty.warn(str(e))
+        except Exception as e:
+            tty.die(str(e))


 def installtarball(args):

Can someone more familiar with the caveats and gotchas regarding binary distribution caches weigh in on whether this is a bad idea?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriageThe issue needs to be prioritized

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions