Skip to content

Only load root specs in env (de)activate#25755

Closed
haampie wants to merge 1 commit intospack:developfrom
haampie:fix/speedup-spack-env-(de)-activation
Closed

Only load root specs in env (de)activate#25755
haampie wants to merge 1 commit intospack:developfrom
haampie:fix/speedup-spack-env-(de)-activation

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Sep 2, 2021

This PR partially reverts #23485 as it introduced a performance regression
in spack env activate. The test from that PR is still there and passing.

The current logic of spack env activate is as follows:

  1. collect root specs
  2. add root specs + transitive run type deps of root specs to a list
  3. call environment_modifications_for_spec for every spec of this list

However, environment_modifications_for_spec also processes run type
dependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) processed specs given n of them in
a chain like a <- b <- ... <- z

This PR drops step 2, so that we call
environment_modifications_for_spec only on the root specs, and this
function will process the run type deps for us anyways.

Given an environment like this:

spack:
  specs:
  - py-flake8
  - py-matplotlib
  - py-isort
  - py-sphinx
  - py-six
  - py-ipython

This is the time spent on activate before and after:

Before:

In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
820 ms ± 6.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which is 68% less.

The diff of spack env activate --sh . is the same except for the order of paths in PYTHONPATH, but note that the logic introduced in #23485 didn't walk over the deps in post order: root_spec.traverse(deptype='run', root=True) which seems to be a mistake? (@scheibelp). Anyways, that's gone with this pr.

@haampie haampie force-pushed the fix/speedup-spack-env-(de)-activation branch from d99b4b4 to a62af99 Compare September 2, 2021 13:18
Partially undoes spack#23485

The current logic of `spack env activate` is as follows:

1. collect root specs
2. add root specs + transient run type deps of root specs to a list
3. call `environment_modifications_for_spec` for every spec of this list

However, `environment_modifications_for_spec` also processes run type
dependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) calls to `modifications_from_dependencies` if we
have a chain of deps `a` <- `b` <- ... <- `z` of deptype `run`.

This PR drops step 2, so that we call
`environment_modifications_for_spec` only on the root specs, and this
function will process the run type for us anyways.

Given an environment like this:

```yaml
spack:
  specs:
  - py-flake8
  - py-matplotlib
  - py-isort
  - py-sphinx
  - py-six
  - py-ipython
```

This is the time spent on `activate` before and after:

**Before**:

```
In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

**After**:

```
In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
761 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

which is 70% less.
@haampie haampie force-pushed the fix/speedup-spack-env-(de)-activation branch from a62af99 to 627a79c Compare September 2, 2021 14:03
dpkg.setup_dependent_build_environment(env, spec)
else:
dpkg.setup_dependent_run_environment(env, spec)
if dep in exe_deps:
Copy link
Copy Markdown
Member Author

@haampie haampie Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp can you comment on this? I generally struggle to follow the logic in modifications_from_dependencies, but this does the job.

My issue though is that one line up we're calling setup_dependent_run_environment on all deps in custom_mod_deps which is run_and_supporting_deps, which includes link deps too, which looks like a mistake? Hence I added if dep in exe_deps. Generally though this function is much more complicated then it needs to be, but I don't want to modify more than necessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).

"this package has already been added".format(
spec.format("{name}/{hash:7}")
)
if root_spec.name in visited:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just also undo the visited part @scheibelp? It now only applies to root specs anyways.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.

I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.

@adamjstewart
Copy link
Copy Markdown
Member

Before

$ time spack env activate .

real	0m18.318s
user	0m15.929s
sys	0m2.228s
$ time spack env deactivate

real	0m24.808s
user	0m20.695s
sys	0m3.949s

After

$ time spack env activate .

real	0m11.325s
user	0m9.979s
sys	0m1.241s
$ time spack env deactivate

real	0m14.652s
user	0m12.286s
sys	0m2.188s

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Sep 3, 2021

Still two orders of magnitude too slow :p but an improvement nonetheless...

@adamjstewart
Copy link
Copy Markdown
Member

We should just store the environment in a bash script and source that instead. I assume that's how conda does things in less than 0.1 sec.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Sep 3, 2021

Yeah, that'd be a useful shortcut, we can hash spack.lock and verify in the shell script if it was derived from the same concrete env. 0.1s is less than the python startup time, so it's likely ;)

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Sep 26, 2021

@scheibelp this shouldn't really require the upcoming proposal about loading behavior, considering #23368 quoted todd _env_modifications_for_default_view doesn’t currently traverse run deps, but it really should and this PR does the same but is more efficient.

So hope you can review :)

dpkg.setup_dependent_build_environment(env, spec)
else:
dpkg.setup_dependent_run_environment(env, spec)
if dep in exe_deps:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).

"this package has already been added".format(
spec.format("{name}/{hash:7}")
)
if root_spec.name in visited:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.

I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.

@adamjstewart adamjstewart removed their assignment May 6, 2022
@tgamblin tgamblin added this to the v0.20.0 milestone Nov 7, 2022
@alalazo alalazo modified the milestones: v0.20.0, v0.21.0 May 10, 2023
@alalazo alalazo removed this from the v0.21.0 milestone Jun 7, 2023
@haampie haampie closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants