Skip to content

Performance: recursive spack load #25669

@tylerjereddy

Description

@tylerjereddy

While I realize that "environments are the future," we continue to rely on spack load -r to load a fairly sizeable set of dependencies internally, and migration would likely be complex. Even after recent performance improvements in develop like #23661 (which cut our load time in half), we are still taking about 1.5 minutes to load our spack installed dependencies each time we run spack load -r ...

I checked a few things--first, the diff below highlights the slowest part of the control flow here--commenting that out drops the run time of spack load -r .. from 90 seconds to 2 seconds (though it probably does almost no work as a result--just highlighting the bottleneck):

diff --git a/lib/spack/spack/cmd/load.py b/lib/spack/spack/cmd/load.py                                                                                                                                                                                                                                                     
index 8bfeac7e69..13c959f333 100644                                                                                                                                                                                                                                                                                         
--- a/lib/spack/spack/cmd/load.pydevelop                                                                                                                                                                                                                                                                                    
+++ b/lib/spack/spack/cmd/load.py                                                                                                                                                                                                                                                                                           @@ -76,7 +76,8 @@ def load(parser, args):
 
         env_mod = spack.util.environment.EnvironmentModifications()
         for spec in specs:
-            env_mod.extend(uenv.environment_modifications_for_spec(spec))
+            # THIS IS THE SLOW PART (uenv.environment_modifications_for_spec call, not the extend()):
+            # env_mod.extend(uenv.environment_modifications_for_spec(spec))
             env_mod.prepend_path(uenv.spack_loaded_hashes_var, spec.dag_hash())
         cmds = env_mod.shell_modifications(args.shell)

I also note that if I serialize cmds into a pickle file as a sort of "caching" mechanism, on the second run I can load in about 1 second, so the shell commands themselves are obviously fast in isolation.

A few questions might be:

  1. am I likely to have a hard time speeding up this control flow? (it is it genuinely more than 1 minute of data processing work on a dedicated supercomputer node to construct the shell strings needed to load the environment?)
  2. do you have a sense for how sensitive the operations in that spec for loop are to "ordering?" For example, could they be dispatched in parallel with i.e., futures or similar? Our loop is approaching 100 iterations at the moment. My intuition is that going parallel is probably just me trying to avoid understanding why it is so slow in serial first, which I probably could/should do by digging deeper into uenv.environment_modifications_for_spec().

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions