Only load root specs in env (de)activate#25755
Conversation
d99b4b4 to
a62af99
Compare
Partially undoes spack#23485 The current logic of `spack env activate` is as follows: 1. collect root specs 2. add root specs + transient run type deps of root specs to a list 3. call `environment_modifications_for_spec` for every spec of this list However, `environment_modifications_for_spec` also processes run type dependencies of the spec, so we're doing a lot of redundant work, resulting in O(n^2) calls to `modifications_from_dependencies` if we have a chain of deps `a` <- `b` <- ... <- `z` of deptype `run`. This PR drops step 2, so that we call `environment_modifications_for_spec` only on the root specs, and this function will process the run type for us anyways. Given an environment like this: ```yaml spack: specs: - py-flake8 - py-matplotlib - py-isort - py-sphinx - py-six - py-ipython ``` This is the time spent on `activate` before and after: **Before**: ``` In [1]: from spack.main import SpackCommand In [2]: env = SpackCommand('env') In [3]: %timeit env('activate', '--sh', '.') 2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` **After**: ``` In [1]: from spack.main import SpackCommand In [2]: env = SpackCommand('env') In [3]: %timeit env('activate', '--sh', '.') 761 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` which is 70% less.
a62af99 to
627a79c
Compare
| dpkg.setup_dependent_build_environment(env, spec) | ||
| else: | ||
| dpkg.setup_dependent_run_environment(env, spec) | ||
| if dep in exe_deps: |
There was a problem hiding this comment.
@scheibelp can you comment on this? I generally struggle to follow the logic in modifications_from_dependencies, but this does the job.
My issue though is that one line up we're calling setup_dependent_run_environment on all deps in custom_mod_deps which is run_and_supporting_deps, which includes link deps too, which looks like a mistake? Hence I added if dep in exe_deps. Generally though this function is much more complicated then it needs to be, but I don't want to modify more than necessary.
There was a problem hiding this comment.
I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).
| "this package has already been added".format( | ||
| spec.format("{name}/{hash:7}") | ||
| ) | ||
| if root_spec.name in visited: |
There was a problem hiding this comment.
Should I just also undo the visited part @scheibelp? It now only applies to root specs anyways.
There was a problem hiding this comment.
This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.
I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.
Before$ time spack env activate .
real 0m18.318s
user 0m15.929s
sys 0m2.228s
$ time spack env deactivate
real 0m24.808s
user 0m20.695s
sys 0m3.949sAfter$ time spack env activate .
real 0m11.325s
user 0m9.979s
sys 0m1.241s
$ time spack env deactivate
real 0m14.652s
user 0m12.286s
sys 0m2.188s |
|
Still two orders of magnitude too slow :p but an improvement nonetheless... |
|
We should just store the environment in a bash script and source that instead. I assume that's how conda does things in less than 0.1 sec. |
|
Yeah, that'd be a useful shortcut, we can hash spack.lock and verify in the shell script if it was derived from the same concrete env. |
|
@scheibelp this shouldn't really require the upcoming proposal about loading behavior, considering #23368 quoted todd So hope you can review :) |
| dpkg.setup_dependent_build_environment(env, spec) | ||
| else: | ||
| dpkg.setup_dependent_run_environment(env, spec) | ||
| if dep in exe_deps: |
There was a problem hiding this comment.
I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).
| "this package has already been added".format( | ||
| spec.format("{name}/{hash:7}") | ||
| ) | ||
| if root_spec.name in visited: |
There was a problem hiding this comment.
This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.
I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.
This PR partially reverts #23485 as it introduced a performance regression
in
spack env activate. The test from that PR is still there and passing.The current logic of
spack env activateis as follows:environment_modifications_for_specfor every spec of this listHowever,
environment_modifications_for_specalso processes run typedependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) processed specs given
nof them ina chain like
a<-b<- ... <-zThis PR drops step 2, so that we call
environment_modifications_for_speconly on the root specs, and thisfunction will process the run type deps for us anyways.
Given an environment like this:
This is the time spent on
activatebefore and after:Before:
After:
which is 68% less.
The diff of
spack env activate --sh .is the same except for the order of paths inPYTHONPATH, but note that the logic introduced in #23485 didn't walk over the deps in post order:root_spec.traverse(deptype='run', root=True)which seems to be a mistake? (@scheibelp). Anyways, that's gone with this pr.