Skip to content

Cache depencencies on the fly when first used#3348

Merged
bgruening merged 4 commits intogalaxyproject:devfrom
abretaud:fill_cache
Jan 1, 2017
Merged

Cache depencencies on the fly when first used#3348
bgruening merged 4 commits intogalaxyproject:devfrom
abretaud:fill_cache

Conversation

@abretaud
Copy link
Contributor

Hi,
A tiny patch to allow caching (conda) dependencies on the fly when a tool is executed the first time (well, anytime the cache is not found in fact).

There is some discussion about this in galaxyproject/planemo#612

Just tested it with planemo, it seems to work fine

ping @mvdbeek @jmchilton

🎅 🎅 🎅

cacheable_dependencies = [dep for dep in resolved_dependencies.values() if dep.cacheable]
hashed_dependencies_dir = self.get_hashed_dependencies_path(cacheable_dependencies)
if os.path.exists(hashed_dependencies_dir):
if not os.path.exists(hashed_dependencies_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a check for conda_auto_install here? I am very much afraid of triggering this at random (whenever these dependencies are being resolved for the first time...) timepoints on production instances.

if not os.path.exists(hashed_dependencies_dir):
# Cache not present, try to create it
self.build_cache(requirements, **kwds)
if os.path.exists(hashed_dependencies_dir): # Check caching was successfull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need one more space before the comment here.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 21, 2016

If you could add the check for conda_auto_install you have my 👍.
Though ideally I would prefer to see a separate setting for this behavior, i.e. build_cache_on_first_run or sth. like that, given that the cached environments are -- in principle -- not bound to conda, so it's a bit counter-intuitive to couple this to conda_auto_install. We could still make build_cache_on_first_run default to True if conda_auto_install is set to true.

@abretaud
Copy link
Contributor Author

Ok, I added the build_cache_on_first_run option (hope it won't make too many options for conda related features!)
I'm hesitant to make it default to True if conda_auto_install is True. It seems a cool behavior, but I fear it would bring some confusion (ie I set it to False in galaxy.ini but it is magically set to True internally)

For planemo, I guess we could set build_cache_on_first_run to True when --use_cached_dependency_manager?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 21, 2016

That looks very useful, thanks for looking into this 👍

@mvdbeek
Copy link
Member

mvdbeek commented Dec 21, 2016

I'm hesitant to make it default to True if conda_auto_install is True. It seems a cool behavior, but I fear it would bring some confusion (ie I set it to False in galaxy.ini but it is magically set to True internally)

For planemo, I guess we could set build_cache_on_first_run to True when --use_cached_dependency_manager?

You're right, this might be a bit surprising. I think that for planemo we could make use_cached_dependency_manager and build_cache_on_first_run the default if conda_auto_install is specified.

resolved_dependencies = self.requirements_to_dependencies(requirements, **kwds)
cacheable_dependencies = [dep for dep in resolved_dependencies.values() if dep.cacheable]
hashed_dependencies_dir = self.get_hashed_dependencies_path(cacheable_dependencies)
if not os.path.exists(hashed_dependencies_dir) and self.extra_config['build_cache_on_first_run'] == True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis is complaining here: comparison to True should be 'if cond is True:' or 'if cond:'

@jmchilton
Copy link
Member

I think I like the direction we are heading here - and I certainly think y'all are correct that planemo should be used to setup these defaults correctly if --conda_auto_install.

I do have two changes I'd like to at least throw out there.

  • I feel like precache_dependencies would be a better name.
  • I also feel like the default should be True, though maybe I'm wrong? If you are using a cached dependency manager you want the dependencies cached right?

@nsoranzo
Copy link
Member

I think we already have many options for Conda and this one is not needed.

I am very much afraid of triggering this at random (whenever these dependencies are being resolved for the first time...) timepoints on production instances.

@mvdbeek Building the cached env probably takes the same time of creating the Conda env for the job (which is later destroyed), no?

@jmchilton
Copy link
Member

@nsoranzo I disagree a bit - if it defaults to True no one needs to worry about it by default. There are scenarios where you want to disable it - for instance if you want to have your dependencies mounted read-only (very reasonable if your trying to lock things down for security reasons).

@nsoranzo
Copy link
Member

@nsoranzo I disagree a bit - if it defaults to True no one needs to worry about it by default. There are scenarios where you want to disable it - for instance if you want to have your dependencies mounted read-only (very reasonable if your trying to lock things down for security reasons).

Fair enough, +1 if it defaults to True.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 21, 2016

Building the cached env probably takes the same time of creating the Conda env for the job (which is later destroyed), no?

That's true ... I guess it's not that big a deal from this angle.
I was just thinking about bioperl, which takes a good 15 minutes to build, but it's true that that would also happen if it's set to false.

@bgruening
Copy link
Member

@abretaud great addition!

@bgruening bgruening merged commit 408254f into galaxyproject:dev Jan 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants