Skip to content

WIP, ENH: add spack load runtimeonly#25723

Closed
tylerjereddy wants to merge 1 commit intospack:developfrom
tylerjereddy:treddy_spack_runtime_load
Closed

WIP, ENH: add spack load runtimeonly#25723
tylerjereddy wants to merge 1 commit intospack:developfrom
tylerjereddy:treddy_spack_runtime_load

Conversation

@tylerjereddy
Copy link
Copy Markdown
Contributor

@tylerjereddy tylerjereddy commented Aug 31, 2021

Fixes #25669
Related to discussion in #25306

cc @haampie @adamjstewart @junghans @cferenba @clellsolomon

  • add a --runtimeonly option to spack load so that
    we can quickly load dependencies for builds outside of
    spack that depend on the specs installed by spack

  • internal testing shows a 20-fold speedup with:
    spack load -r --runtimeonly collected_deps_package

  • does not contain regression tests/docs yet, pending review
    of whether spack team would actually want this, etc.

  • note that for our internal use case we also have
    to adjust modules.yaml to generate CPATH for our
    CMake build per changes in:
    https://github.com/spack/spack/pull/14749/files
    https://github.com/spack/spack/pull/21699/files

* add a `--runtimeonly` option to `spack load` so that
we can quickly load dependencies for builds outside of
`spack` that depend on the `specs` installed by `spack`

* internal testing shows a 40-fold speedup with:
`spack load -r --runtimeonly collected_deps_package`

* does not contain regression tests/docs yet, pending review
of whether `spack` team would actually want this, etc.

* note that for our internal use case we also have
to adjust `modules.yaml` to generate `CPATH` for our
CMake build per changes in:
https://github.com/spack/spack/pull/14749/files
https://github.com/spack/spack/pull/21699/files
@tylerjereddy
Copy link
Copy Markdown
Contributor Author

CI failure is: AssertionError: It looks like Spack's command-line interface has been modified. Please update Spack's shell tab completion scripts by running:. Probably better to discuss this before going even farther with the code changes though.

@haampie haampie marked this pull request as draft September 1, 2021 13:14
@haampie
Copy link
Copy Markdown
Member

haampie commented Sep 1, 2021

I don't think this is the way to go:

  • This doesn't solve the actual bug with repeated entries in PATH etc without the introduced flag
  • The new flag is in many cases useless since it doesn't load required runtime deps for the package. Sorry: I see you're supposed to combine -r and --runtimeonly* -- it's weird to allow --runtime-only with --only=package is what I meant to say.

What has to be addressed is this:

a) spack load x loops over each spec, and then calls environment_modifications_for_spec which recursively loads things; this is where you get worst case quadratic complexity and repeated entries in PATH.

And potentially this:

b) spack load x defaults to loading all dependencies, why don't we change the default to the package (+ its run type deps, which is implied)

* As a side-note, it's weird that we have duplicate options for spack load 😕 -r has no effect:

  -r, --dependencies    recursively traverse spec dependencies
  --only {package,dependencies}

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

This doesn't solve the actual bug with repeated entries in PATH etc without the introduced flag

Yeah, I wasn't really trying to fix that bug, which is why I don't claim to fix the second connected issue above. If you think fixing those other issues is really going to be cleaner and more performant for the specific use case of a fast path to recursive loading I could take a look. I think my intuition was just to avoid doing surgery on currently tested code and just provide a fast path to what I need.

it's weird to allow --runtime-only with --only=package is what I meant to say.

Maybe stricter "orthogonality" rules might be needed for the load options--I was just trying to demonstrate a fast path that works for us.

In the original patch I provided I did selectively disable some of the things you mention, but it sounded like we didn't want to adjust that based on feedback at: #25306 (comment)

So, you do want to start hacking around inside of environment_modifications_for_spec() then? I guess I can't see which is your preference from comments here vs. there.

@haampie
Copy link
Copy Markdown
Member

haampie commented Sep 1, 2021

In the original patch I provided I did selectively disable some of the things you mention, but it sounded like we didn't want to adjust that based on feedback at

Sorry for the confusion, and you're right, at that point I thought environment_modifications_for_spec was called from the build environment too because it calls set_module_variables_for_package which makes most sense in builds; but apparently it's the other way around, environment_modifications_for_spec is just calling into build_env functions. So it doesn't hurt to modify it.

I have the impression that wherever environment_modifications_for_spec() is called, the call site doesn't realize that it recursively loads run dependencies too. There's 4 places where we call it (load, unload, spack environment activation, and bootstrap), and in all cases it is called from loops over specs and there's a potential quadratic complexity cost of doing so.

Notably in spack.environment._env_modifications_for_default_view it loops over runtime dependencies of root specs for spec in root_spec.traverse(deptype='run', root=True):, then it calls uenv.environment_modifications_for_spec(spec, ...) on that, and later in add_default_view_to_shell it explicitly removes duplicate paths in all env variables...

        for env_var in env_mod.group_by_name():
            env_mod.prune_duplicate_paths(env_var)

this also explains @adamjstewart's comment #25306 (comment). Also, the only reason spack env activate is faster is because it defaults to loading run type deps of root specs only, whereas spack load defaults to loading all transient dependencies of any type -- but both have this quadratic complexity problem of processing the same specs over and over.

So, I think what we can do is drop the recursion to dependencies in environment_modifications_for_spec, and we need some utility that takes a list of concrete specs, and expands that to a list of unique specs with transient run deps in a proper order, and then we could call environment_modifications_for_spec on each in order. This should speed up things not only in spack load but also in spack env activate.

Edit: this is a huge rabbit hole unfortunately :( modifications_from_dependencies is still necessary to call e.g. setup_dependent_run_environment.

@haampie
Copy link
Copy Markdown
Member

haampie commented Sep 3, 2021

Coming back to this PR after going through the Spack internals and git history:

I have the impression that wherever environment_modifications_for_spec() is called, the call site doesn't realize that it recursively loads run dependencies too.

This turned out to be only partially true. It recursively calls setup_dependent_run_environment, but didn't call setup_run_environment on dependencies. There is some revived discussion about dropping setup_dependent_run_environment, which I'd be in favor of.

Before that discussion is settled, I've submitted a PR to make environment_modifications_for_spec call setup_run_environment of dependencies too: #25755.

Then my preferred way of speeding up spack load is to make spack load x only load x by default, which means x + its required runtime deps -- right now it also loads build deps and their dependencies, that doesn't make sense at all. I've submitted a PR for that here: #25732, but it requires #25755 too.

Can you maybe verify that 25732 on top of 25755 gives you a reasonable speedup for spack load @tylerjereddy?

Finally, w.r.t. my previous comment

I think what we can do is drop the recursion to dependencies in environment_modifications_for_spec

that was one way of doing things, but it's awkward to call setup_dependent_run_environment then, so it seems easiest to keep environment_modifications_for_spec loading run type deps recursively.

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

@haampie Ok, I'll try to take a look at the performance on our internal use case today. I'm guessing spack doesn't currently benchmark functions with i.e., asv to guard against regressions?

@haampie
Copy link
Copy Markdown
Member

haampie commented Sep 3, 2021

No, we don't have that, but that looks great!

As a side note: I think spack load hasn't received a whole lot of attention since environments became a thing, maybe you could use that for your day to day use as a replacement

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

FWIW, I can simplify to not add a new load option, just fast-pathing the recursive option, and it retains the performance improvement along with passing spack unit-test with the diff below. I wonder if the regression tests need a bit of work to reflect some of the concerns. For now, perhaps we'll have to vendor this locally pending the "proper fix."

--- a/lib/spack/spack/cmd/load.py
+++ b/lib/spack/spack/cmd/load.py
@@ -75,9 +75,22 @@ def load(parser, args):
                      spec.traverse(root=include_roots, order='post')]
 
         env_mod = spack.util.environment.EnvironmentModifications()
-        for spec in specs:
-            env_mod.extend(uenv.environment_modifications_for_spec(spec))
-            env_mod.prepend_path(uenv.spack_loaded_hashes_var, spec.dag_hash())
+
+        if args.recurse_dependencies:
+            # fast path for `spack load -r`
+            for spec in specs:
+                env = spack.util.environment.inspect_path(
+                    spec.prefix,
+                    uenv.prefix_inspections(spec.platform),
+                    exclude=spack.util.environment.is_system_path
+                )
+                spec.package.setup_run_environment(env)
+                env_mod.extend(env)
+        else:
+            for spec in specs:
+                env_mod.extend(uenv.environment_modifications_for_spec(spec))
+                env_mod.prepend_path(uenv.spack_loaded_hashes_var, spec.dag_hash())
+
         cmds = env_mod.shell_modifications(args.shell)
 
         sys.stdout.write(cmds)

@haampie
Copy link
Copy Markdown
Member

haampie commented Sep 8, 2021

Note that there are two bugs/issues in your patch:

  1. recurse_dependencies has no influence whatsoever on specs, it's a redundant flag and probably an oversight (Remove unused --dependencies flag #25731). If you want to keep this patch around, make it conditional on args.things_to_load == 'package,dependencies' instead of args.recurse_dependencies.

  2. Your fast path is not equivalent to calling environment_modifications_for_spec because it does not call setup_dependent_run_environment in packages. This is used for instance in the Python package to set the PYTHONPATH to make it locate packages.

So yes, this has to be fixed properly and the above fast path can't be merged

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

Thanks, sounds good--I'll close this--it does pass our local testing just fine. Perhaps when the code paths are better covered by regression tests it will be easier for us to help out here, but if we can't depend on the tests to find issues when iterating it seems like this is best left to the core team to patch for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: recursive spack load

2 participants