Skip to content

Expose storing dumping of objects to disk #593

@GaelVaroquaux

Description

@GaelVaroquaux

It is often useful to dump objects to disk for manual out of core programming:

  • notion of a “Shelf” (as a follow up of [MRG] Custom store backends API #397):

    • Simplest use case return drop in replacement : X is a memmapped version of X) (and thus does not address dask needs):
      X = shelve_mmap(X)
    • For X to be a future (here we implement a subset of the public interface of distributed futures):
      X = shelve(X)
  • Contract: Shelves are transient: objects are deleted when the ref counts goes to zero / when the program exists. Difficulty: in multiple processes on the same box, the notion of refcount is broken.

  • Implementation:

    • A module joblib._shelf
    • A function joblib.shelve that called a "get_shelf" function to retrieve a shelf object
    • Two shelf objects JoblibShelf, and DistributedShelf. The first one uses a StoreBackend and makes it transient by implementing garbage collection on this. This is not needed for DistributedShelf as the distributed store is already transient.
    • Should the PID of the parent process be coded in the directory of the shelf?
  • The joblib transient store is by default in a folder that is determined by the same logic as in joblib.pool._get_temp_dir

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions