Add cached dependency manager#3106
Conversation
63451ef to
3fab2b8
Compare
config/galaxy.ini.sample
Outdated
| # if it cannot find a local copy and conda_exec is not configured. | ||
| #conda_auto_init = False | ||
|
|
||
| # Certain dependency resolvers (namely conda) take a considerable amount of |
config/galaxy.ini.sample
Outdated
|
|
||
| # Certain dependency resolvers (namely conda) take a considerable amount of | ||
| # time to build an isolated job environment in the job_working_directory. Set | ||
| # the following option to true to cache the dependencies in a folder. This |
config/galaxy.ini.sample
Outdated
| # the following option to true to cache the dependencies in a folder. This | ||
| # option is beta and should only be used if you experience long waiting times | ||
| # before a job is actually submitted to your cluster. If you activate this | ||
| # option, and you install new dependencies you may need to clear out old cached |
There was a problem hiding this comment.
Move comma after dependencies.
config/galaxy.ini.sample
Outdated
|
|
||
| # By default the tool_dependency_cache_dir is the _cache directory | ||
| # of the tool dependency directory | ||
| #tool_dependency_cache_dir = tool_dependency_dir/_cache |
There was a problem hiding this comment.
s/tool_dependency_dir/<tool_dependency_dir>/
There was a problem hiding this comment.
hmm, so
tool_dependency_cache_dir = _cache
?
I would like to indicate that by default we store the cache in tool_dependency_dir/_cache.
Maybe
tool_dependency_cache_dir = <tool_dependency_dir>/_conda
would be better?
There was a problem hiding this comment.
I guess you saw my comment before I edited it to add quotes, GitHub hides everything between the angle brackets.
Anyway, I suggested to use
<tool_dependency_dir>/_cache
There was a problem hiding this comment.
sorry, forgot about that GH quirk :)
config/galaxy.ini.sample
Outdated
|
|
||
| # Certain dependency resolvers (namely Conda) take a considerable amount of | ||
| # time to build an isolated job environment in the job_working_directory if the | ||
| # job working diretory is on a network share. Set the following option to True |
lib/galaxy/tools/deps/__init__.py
Outdated
| # while other resolvers may not create the hashed_requirements_dir | ||
| os.mkdir(hashed_requirements_dir) | ||
| with open(os.path.join(hashed_requirements_dir, 'dep_commands.sh'), 'w') as cmds_f: | ||
| [cmds_f.write("%s\n" % line) for line in commands] |
There was a problem hiding this comment.
This looks "perlish":
for line in commands:
cmds_f.write("%s\n" % line)| # used if you experience long waiting times before a job is actually submitted | ||
| # to your cluster. If you activate this option and install or remove dependencies, | ||
| # you may need to clear out old cached environments | ||
| #use_cached_dependency_manager = False |
There was a problem hiding this comment.
Should we name this something like conda?
People could be confused if they are using Docker.
There was a problem hiding this comment.
In principle this works for all non-container resolver types, except that you get the biggest benefits with conda. But maybe I should find another angle and do the caching only for conda ... I kind of like inheriting from DependencyManager though.
Let's see if I can come up with an elegant way to do this for conda only. That would reduce the number of environments that would be cached.
|
In my testing this is working pretty well: |
|
Sorry I didn't respond sooner - I was at a conference last week. I appreciate the effort and I know this is solving a real problem, it isn't exactly how I would have gone about it - kind of close though. For instance this doesn't handle locking well right? There could be a race condition there because the bash stuff is pretty basic? I'd also hope the cache conda dependencies thing would sit beside conda somewhere by default to solve @natefoo's linking problems (this PR does that I think) and would cause the cached environment creations during TS installs (to solve @natefoo's RO problems). What do you think about this modification to the approach? The TS piece doesn't need to be apart of this - but I'm a bit worried about the race condition. |
|
I think conda does have a mechanism to prevent race conditions -- I have I can definitely add the cache creation during TS install, that should be On Oct 31, 2016 2:19 PM, "John Chilton" notifications@github.com wrote:
|
|
Hmm, conda is actually locking, both for the creation of the environment as well as when calling conda install on existing environments. that means only 1 job at a time is being prepared for submission :(. |
Do we actually still need the |
Good question, I'm not sure. Maybe not; they live within the
Do you know more about this, @bgruening ? |
|
@jmchilton I've rebased the PR and updated the PR description. |
|
I believe this is exactly what I need to run these directly from CVMFS. I can give it a test if I can remember one of the tools that had the problems. |
|
Conflicting again? grrrr |
Similar to galaxyproject#2986, implement a mechanism that allows tool dependencies to be cached. If the `use_cached_dependency_manager` option is set to True in galaxy.ini, we build a hash of the combination of a tools' requirements, and store the resulting environment in a directory specified by the `tool_dependency_cache_dir` option in galaxy.ini.
to updates or changes in depedencies, folder structure and resolver configuration. Instead of hashing name, type and version of a dependency, hash the json representation of the dependencies returned by the dependency resolver, which include the path to the environment and the depedency type. This is only applied to resolvers whose cacheable attribute is set to True (conda-only, currently).
and only activate cached environments if they exist.
and override __eq__ for ToolRequirement, to simplify checking if ToolRequirements are already installed/cached.
c02666c to
df93db4
Compare
|
OK, rebased it again. |
|
Thanks for the PR - this was really needed! |
|
So, most of my conda installed tools are failing with the error |
|
Perhaps we could get this into |
|
It's working well for me, but for convenience we should also backport #3222. Otherwise you would need to re-install these tools. |
|
Hmmm, what are you doing to avoid the BTW, thanks so much for this PR! Can't wait to be able to use conda! |
|
Also, I didn't think |
|
@lparsons 16.10 is not out, but it's feature freezed. Whether this is a new feature or a bug fix, or if an exception should be granted, is up for discussion. |
|
@nsoranzo Well, since it wouldn't be enabled by default, I think it's relatively safe. And it's certainly attempting to resolve a showstopper of a bug for me, so in some sense it is a "bug fix". ;-) |
Sorry, I wasn't clear, I meant that I'm using this in production, so I would say this is (probably) not going to break stuff all over the place. I opened PR #3227 for the backport. |
|
@mvdbeek You rock! Thanks. |
Similar to #2986, implements a mechanism that allows tool dependencies to be cached.
If the
use_cached_dependency_manageroption is set to True ingalaxy.ini, we build a hash of the combination of a tools' requirements, and store the resulting environment in a directory specified by thetool_dependency_cache_diroption ingalaxy.ini.The approach here is only caching (combinations of) conda dependencies when installing new tools through the ToolShed.
When a dependency combination is not yet cached,
conda createwill be invoked withtool_dependency_cache_dir/<hash>folder, and otherwiseconda installwithtool_dependency_cache_dir/<hash>.If a
__name@versiondependency exists, but not a cached environment at job runtime, that environment will be used. So to benefit from cached environments for already installed tools, those cached environment need to be created. Currently this can only be done by (re-)installing tools.ping @abretaud @jmchilton