[Callbacks] ENH Don't create a new manager for each callback

see https://github.com/scikit-learn/scikit-learn/pull/28760#discussion_r2822294959
related to https://github.com/scikit-learn/scikit-learn/issues/27676

Since callbacks are expected to be process aware and in particular aggregate information accross processes, they more or less all need to hold a data structure (list, dict, queue, ...) that is managed by a ``multiprocessing.Manager()``. Each manager creates a manager process, so we don't want to accumulate them.

There are callbacks that can create this data structure in ``on_fit_begin`` but others that need to create it at initialization. Taking into account the fact that it should work in custom user code, the most robust solution would be to have a sklearn global manager. It would be created lazily, i.e. the first time a callbacks requests a manager, and could then be accessed by all callbacks.

It implies that it would create a process that lives throughout the whole program session. Maybe that's okay.

To prevent that it accumulates data without control, we should make sure that when a shared data structure is no longer needed (i.e. when a callback gets garbage collected) it's automatically destroyed. If we find that the garbage collector is not able to do it properly on its own, we could rely on weakref finalizers.

ping @FrancoisPgm 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Callbacks] ENH Don't create a new manager for each callback #33325

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Callbacks] ENH Don't create a new manager for each callback #33325

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions