-
Notifications
You must be signed in to change notification settings - Fork 450
Expose storing dumping of objects to disk #593
Copy link
Copy link
Labels
Description
It is often useful to dump objects to disk for manual out of core programming:
-
notion of a “Shelf” (as a follow up of [MRG] Custom store backends API #397):
- Simplest use case return drop in replacement : X is a memmapped version of X) (and thus does not address dask needs):
X = shelve_mmap(X) - For X to be a future (here we implement a subset of the public interface of distributed futures):
X = shelve(X)
- Simplest use case return drop in replacement : X is a memmapped version of X) (and thus does not address dask needs):
-
Contract: Shelves are transient: objects are deleted when the ref counts goes to zero / when the program exists. Difficulty: in multiple processes on the same box, the notion of refcount is broken.
-
Implementation:
- A module joblib._shelf
- A function joblib.shelve that called a "get_shelf" function to retrieve a shelf object
- Two shelf objects JoblibShelf, and DistributedShelf. The first one uses a StoreBackend and makes it transient by implementing garbage collection on this. This is not needed for DistributedShelf as the distributed store is already transient.
- Should the PID of the parent process be coded in the directory of the shelf?
-
The joblib transient store is by default in a folder that is determined by the same logic as in joblib.pool._get_temp_dir
Reactions are currently unavailable