Skip to content

[Callbacks] ENH Don't create a new manager for each callback #33325

@jeremiedbb

Description

@jeremiedbb

see #28760 (comment)
related to #27676

Since callbacks are expected to be process aware and in particular aggregate information accross processes, they more or less all need to hold a data structure (list, dict, queue, ...) that is managed by a multiprocessing.Manager(). Each manager creates a manager process, so we don't want to accumulate them.

There are callbacks that can create this data structure in on_fit_begin but others that need to create it at initialization. Taking into account the fact that it should work in custom user code, the most robust solution would be to have a sklearn global manager. It would be created lazily, i.e. the first time a callbacks requests a manager, and could then be accessed by all callbacks.

It implies that it would create a process that lives throughout the whole program session. Maybe that's okay.

To prevent that it accumulates data without control, we should make sure that when a shared data structure is no longer needed (i.e. when a callback gets garbage collected) it's automatically destroyed. If we find that the garbage collector is not able to do it properly on its own, we could rely on weakref finalizers.

ping @FrancoisPgm

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions