Skip to content

[Callbacks] FEA Add callbacks when cloning an estimator#33490

Closed
StefanieSenger wants to merge 3 commits intoscikit-learn:callbacksfrom
StefanieSenger:callbacks_clone
Closed

[Callbacks] FEA Add callbacks when cloning an estimator#33490
StefanieSenger wants to merge 3 commits intoscikit-learn:callbacksfrom
StefanieSenger:callbacks_clone

Conversation

@StefanieSenger
Copy link
Copy Markdown
Member

Reference Issues/PRs

Towards #33323
Alternative to #33340 and #33421.

What does this implement/fix? Explain your changes.

Fixes clone() if an estimator comes with a callback.

Opening this PR for completeness, because we had forgotten to discuss this simple alternative.

AI usage disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

@StefanieSenger
Copy link
Copy Markdown
Member Author

StefanieSenger commented Mar 9, 2026

Closing, because I have only now seen that we had already done this change

-       new_object._skl_callbacks = clone(estimator._skl_callbacks, safe=False)
+       new_object._skl_callbacks = estimator._skl_callbacks

in https://github.com/scikit-learn/scikit-learn/pull/33394/changes#r2845852984

(Obviously I have worked on an older version of the callback branch for a few days without realising that it had updated. 🫤

Still, the question is: why not simply leave it be like it is.
@jeremiedbb @FrancoisPgm

@StefanieSenger StefanieSenger deleted the callbacks_clone branch March 9, 2026 15:42
@jeremiedbb
Copy link
Copy Markdown
Member

Still, the question is: why not simply leave it be like it is.

The current behavior of clone is temporary. In the mean time we decided that clone needs to not keep the callbacks to prevent crashes when an estimator with a callback is passed to a meta-estimator (or function) that doesn't support callbacks.

@StefanieSenger
Copy link
Copy Markdown
Member Author

StefanieSenger commented Mar 9, 2026

Thanks, @jeremiedbb.

clone needs to not keep the callbacks to prevent crashes when an estimator with a callback is passed to a meta-estimator (or function) that doesn't support callbacks.

I'm sorry but I don't know what you mean.

To my understanding, the goal of changing the cloning behaviour is preventing TypeError: cannot pickle '_thread.lock' object in the code below while at the same time keeping the Manager().Queue() accessible (not excluding it from the pickling) from different clones of an estimator that may exist in parallel; and your fix in #33394 does just that.

from sklearn.datasets import make_classification
from sklearn.base import clone
from sklearn.callback import ProgressBar
from sklearn.callback.tests._utils import MaxIterEstimator
X, y = make_classification(n_samples=1000, random_state=42)
est = MaxIterEstimator().set_callbacks(ProgressBar()).fit(X, y)
clone(est)

Also, #33421 does allow the behavior of your fix in #33394 by passing keep_callbacks=True. That this could fail if the function does not support callbacks is not tested and if I test this myself by modifying your test_function_without_callback_support like this:

estimator = MaxIterEstimator().set_callbacks(TestingCallback())

def fit_one(estimator):
    estimator = clone(estimator, keep_callbacks=True)
    estimator.fit()

def my_function(estimator, n_fits=4):
    """A function cloning and fitting an estimator in parallel."""
    Parallel(n_jobs=1)(delayed(fit_one)(estimator) for _ in range(n_fits))

my_function(estimator)

Everything works fine and does not crash, even though an estimator with a callback is passed to function that doesn't support callbacks.

This is why I believe

-       new_object._skl_callbacks = clone(estimator._skl_callbacks, safe=False)
+       new_object._skl_callbacks = estimator._skl_callbacks

and maybe adding some documentation and testing is all we need.

@jeremiedbb
Copy link
Copy Markdown
Member

Everything works fine and does not crash, even though an estimator with a callback is passed to function that doesn't support callbacks.

maybe it doesn't crash for TestingCallback, but it doesn't mean that it worked (and it may crash with a different callback).
If you look at the record attr of TestingCallback after the function call, I strongly suspect that the counts of the different hooks won't match the expectations.

For instance, if you use a progressbar instead, I expect that (if it doesn't crash) n_fits threads are spawned which is not what we want. Indeed if we implement callback support in my_function, only 1 thread is spawned.

@StefanieSenger
Copy link
Copy Markdown
Member Author

StefanieSenger commented Mar 10, 2026

Thank you, @jeremiedbb. I appreciate your answer.

maybe it doesn't crash for TestingCallback, but it doesn't mean that it worked (and it may crash with a different callback).

It also doesn't crash for ProgressBar.

If you look at the record attr of TestingCallback after the function call, I strongly suspect that the counts of the different hooks won't match the expectations.

Sorry, but TestingCallback doesn't have a record attribute? I also didn't find any used in another context in the callback module. So, I don't know what you may mean. (Edit: Oh sorry, it has. Strange, I had gotten an AttributeError. Will look into this later.)

For instance, if you use a progressbar instead, I expect that (if it doesn't crash) n_fits threads are spawned which is not what we want. Indeed if we implement callback support in my_function, only 1 thread is spawned.

I need to think more about this. Without callback support in my_function we of cause expect one RichProgressMonitor thread per sub-estimator with ProgressBar (n_fits here). What is unclear is why we would expect it using only one RichProgressMonitor thread in total if my_function would have callback support while inside we use cloned sub-estimators in parallel. To my current understanding having all of the clones run their richtask update in only one thread could not work.

No rush, but we should know what exactly would crash / not work as expected before merging a solution, because we need to add this as a test. It doesn't need to be before the vote on the SLEP or even before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants