Implement in_store method to check if a result is already in store#730
Implement in_store method to check if a result is already in store#730nwilming wants to merge 4 commits intojoblib:masterfrom
Conversation
aabadie
left a comment
There was a problem hiding this comment.
Thanks @nwilming for working on this. I have a few minor comments, see below.
There's also this comment, raised in #407, that I'm not sure if it's still valid and if it should be addressed here.
joblib/memory.py
Outdated
| return (self.__class__, (self.func, self.store_backend, self.ignore, | ||
| self.mmap_mode, self.compress, self._verbose)) | ||
|
|
||
| def in_store(self, *args, **kwargs): |
There was a problem hiding this comment.
I would prefer a name that better reflects what Memory is about: caching results. Why not calling this function: cached ?
The concept of store is more related to the underlying way used to handle the cache.
There was a problem hiding this comment.
Sounds good to me, I'll rename the function to cahced.
joblib/memory.py
Outdated
| load from backend, else False. | ||
| """ | ||
| func_id, args_id = self._get_output_identifiers(*args, **kwargs) | ||
| if (self._check_previous_func_code(stacklevel=4) and |
There was a problem hiding this comment.
This could be done in a one-liner:
return (self._check_previous_func_code(stacklevel=4) and self.store_backend.contains_item([func_id, args_id]))(You will just have to split the line to be pep8 compliant)
joblib/test/test_memory.py
Outdated
| args = (1, 2, 3, 4) | ||
| kwargs = {'e': 5, 'f': 6} | ||
| g(*args, **kwargs) | ||
| assert(g.in_store(*args, **kwargs)) |
There was a problem hiding this comment.
no need for parenthesis with assert. Use the following instead (and be consistent with other uses of assert):
assert g.in_store(*args, **kwargs)
joblib/memory.py
Outdated
|
|
||
| Returns | ||
| ------- | ||
| cached: bool |
Codecov Report
@@ Coverage Diff @@
## master #730 +/- ##
==========================================
- Coverage 95.31% 95.29% -0.02%
==========================================
Files 42 42
Lines 6128 6145 +17
==========================================
+ Hits 5841 5856 +15
- Misses 287 289 +2
Continue to review full report at Codecov.
|
|
I've reabased my code on upstream/master and made the changes you requested. Let me know if something is missing or needs additional changes. Regarding the comment in #407 that But I guess eventually these will occur and then the current Let me know what you think. |
tomMoral
left a comment
There was a problem hiding this comment.
Hello,
Sorry for the rather long reviewing process. Thank you very much for your work, I think this is indeed a nice feature that has been missing for quite some time.
The PR is a bit outdated and it seems that introducing load might drop a bit the performances. As there is a concurrent PR in #820 , I am in favor of getting the other one merged but we need to integrate some of the changes you proposed (did not realized in the other PR that the no-op was not updated).
Let me know if you think I missed a point somewhere.
| _, name = get_func_name(self.func) | ||
| msg = '%s cache loaded - %s' % (name, format_time(t)) | ||
| print(max(0, (80 - len(msg))) * '_' + msg) | ||
| out = self.load(*args, **kwargs) |
There was a problem hiding this comment.
This might hurt the performance here.
With this solution, we need to call _get_output_identifier twice and thus hash the argument multiple time. This can be slow in case where large argument are passed.
|
This feature has been added in #820 so I am closing this PR. |
In some cases it is nice to be able to check if the result of a function call is already available in the backend store. For example, when dispatching long computations to a cluster this allows to check if and which computations have already run. Being able to check for this makes it particularly easy to collect all available results without triggering computation of those that are still running. At least two issues have mentioned this ( #407 - a pull request which now does not merge with master and #668).
This is an attemt to provide this functionality by introducing an 'in_store' method that can be called with the same signature as the original function. When a result is in store for some input arguments it returns True, else False.