Implement in_store method to check if a result is already in store by nwilming · Pull Request #730 · joblib/joblib

nwilming · 2018-07-31T16:08:40Z

In some cases it is nice to be able to check if the result of a function call is already available in the backend store. For example, when dispatching long computations to a cluster this allows to check if and which computations have already run. Being able to check for this makes it particularly easy to collect all available results without triggering computation of those that are still running. At least two issues have mentioned this ( #407 - a pull request which now does not merge with master and #668).

This is an attemt to provide this functionality by introducing an 'in_store' method that can be called with the same signature as the original function. When a result is in store for some input arguments it returns True, else False.

aabadie

Thanks @nwilming for working on this. I have a few minor comments, see below.

There's also this comment, raised in #407, that I'm not sure if it's still valid and if it should be addressed here.

aabadie · 2018-08-08T13:17:51Z

joblib/memory.py

        return (self.__class__, (self.func, self.store_backend, self.ignore,
                self.mmap_mode, self.compress, self._verbose))

+    def in_store(self, *args, **kwargs):


I would prefer a name that better reflects what Memory is about: caching results. Why not calling this function: cached ?
The concept of store is more related to the underlying way used to handle the cache.

Sounds good to me, I'll rename the function to cahced.

aabadie · 2018-08-08T13:18:50Z

joblib/memory.py

+                load from backend, else False.
+        """
+        func_id, args_id = self._get_output_identifiers(*args, **kwargs)
+        if (self._check_previous_func_code(stacklevel=4) and


This could be done in a one-liner:

return (self._check_previous_func_code(stacklevel=4) and self.store_backend.contains_item([func_id, args_id]))

(You will just have to split the line to be pep8 compliant)

aabadie · 2018-08-08T13:20:37Z

joblib/test/test_memory.py

+    args = (1, 2, 3, 4)
+    kwargs = {'e': 5, 'f': 6}
+    g(*args, **kwargs)
+    assert(g.in_store(*args, **kwargs))


no need for parenthesis with assert. Use the following instead (and be consistent with other uses of assert):

assert g.in_store(*args, **kwargs)

aabadie · 2018-08-08T13:21:29Z

joblib/memory.py

+
+        Returns
+        -------
+            cached: bool


No need for this line

codecov · 2018-08-10T08:46:27Z

Codecov Report

Merging #730 into master will decrease coverage by 0.01%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master     #730      +/-   ##
==========================================
- Coverage   95.31%   95.29%   -0.02%     
==========================================
  Files          42       42              
  Lines        6128     6145      +17     
==========================================
+ Hits         5841     5856      +15     
- Misses        287      289       +2

Impacted Files	Coverage Δ
joblib/test/test_memory.py	`97.54% <100%> (+0.05%)`	⬆️
joblib/memory.py	`95.15% <80%> (-0.22%)`	⬇️
joblib/backports.py	`93.75% <0%> (-2.09%)`	⬇️
joblib/_parallel_backends.py	`96.8% <0%> (-1.21%)`	⬇️
joblib/_store_backends.py	`90.47% <0%> (-0.53%)`	⬇️
joblib/disk.py	`88.33% <0%> (+6.66%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35901b8...aaf5abf. Read the comment docs.

nwilming · 2018-08-10T10:16:21Z

I've reabased my code on upstream/master and made the changes you requested. Let me know if something is missing or needs additional changes.

Regarding the comment in #407 that cached can return false positives: My use case is that I distribute function calls onto a Torque cluster (across args) and once everything is done I collect the results by calling the function with all relevant args. In this context being able to check for presence of results in the backend is useful on it's own and I've never had false positives yet.

But I guess eventually these will occur and then the current cached function would not suffice to allow for explicit rescheduling of function calls with false positives. This would require a complementary load function that loads results and raises an exception when the result can not be loaded. I've added this function as well (essentially just a refactoring of _cached_call) in commit aaf5abf.

Let me know what you think.

AlJohri · 2018-10-24T23:47:12Z

I would love to see this integrated @aabadie. Thanks for doing this @nwilming!

tomMoral

Hello,

Sorry for the rather long reviewing process. Thank you very much for your work, I think this is indeed a nice feature that has been missing for quite some time.

The PR is a bit outdated and it seems that introducing load might drop a bit the performances. As there is a concurrent PR in #820 , I am in favor of getting the other one merged but we need to integrate some of the changes you proposed (did not realized in the other PR that the no-op was not updated).

Let me know if you think I missed a point somewhere.

tomMoral · 2020-10-05T21:50:40Z

joblib/memory.py

-                    _, name = get_func_name(self.func)
-                    msg = '%s cache loaded - %s' % (name, format_time(t))
-                    print(max(0, (80 - len(msg))) * '_' + msg)
+                out = self.load(*args, **kwargs)


This might hurt the performance here.
With this solution, we need to call _get_output_identifier twice and thus hash the argument multiple time. This can be slow in case where large argument are passed.

tomMoral · 2022-02-23T15:59:45Z

This feature has been added in #820 so I am closing this PR.

Implement in_store method to check if a result is already in store

c6d5795

nwilming mentioned this pull request Jul 31, 2018

Checking if a cached result exists? #668

Closed

aabadie suggested changes Aug 8, 2018

View reviewed changes

Rename in_store to cached

1acbd3f

nwilming added 2 commits August 10, 2018 12:10

Remove parentheses arount assert

3b4c227

Add load function to return results from store

aaf5abf

aabadie mentioned this pull request Jan 25, 2019

Add check_call_in_cache method to check cache without calling function (as per note in the code). Add some tests. #820

Merged

tomMoral reviewed Oct 5, 2020

View reviewed changes

tomMoral closed this Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement in_store method to check if a result is already in store#730

Implement in_store method to check if a result is already in store#730
nwilming wants to merge 4 commits intojoblib:masterfrom
nwilming:master

nwilming commented Jul 31, 2018

Uh oh!

aabadie left a comment

Uh oh!

aabadie Aug 8, 2018

Uh oh!

nwilming Aug 10, 2018

Uh oh!

aabadie Aug 8, 2018

Uh oh!

aabadie Aug 8, 2018

Uh oh!

aabadie Aug 8, 2018

Uh oh!

codecov bot commented Aug 10, 2018 •

edited

Loading

Uh oh!

nwilming commented Aug 10, 2018

Uh oh!

AlJohri commented Oct 24, 2018

Uh oh!

tomMoral left a comment

Uh oh!

tomMoral Oct 5, 2020

Uh oh!

tomMoral commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nwilming commented Jul 31, 2018

Uh oh!

aabadie left a comment

Choose a reason for hiding this comment

Uh oh!

aabadie Aug 8, 2018

Choose a reason for hiding this comment

Uh oh!

nwilming Aug 10, 2018

Choose a reason for hiding this comment

Uh oh!

aabadie Aug 8, 2018

Choose a reason for hiding this comment

Uh oh!

aabadie Aug 8, 2018

Choose a reason for hiding this comment

Uh oh!

aabadie Aug 8, 2018

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nwilming commented Aug 10, 2018

Uh oh!

AlJohri commented Oct 24, 2018

Uh oh!

tomMoral left a comment

Choose a reason for hiding this comment

Uh oh!

tomMoral Oct 5, 2020

Choose a reason for hiding this comment

Uh oh!

tomMoral commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Aug 10, 2018 •

edited

Loading