Skip to content

FEA Callbacks base infrastructure + progress bars#27663

Closed
jeremiedbb wants to merge 57 commits intoscikit-learn:callbacksfrom
jeremiedbb:base
Closed

FEA Callbacks base infrastructure + progress bars#27663
jeremiedbb wants to merge 57 commits intoscikit-learn:callbacksfrom
jeremiedbb:base

Conversation

@jeremiedbb
Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb commented Oct 25, 2023

Extracted from #22000

This PR implements a smaller portion of #22000, with only the base infrastructure for callbacks and the implementation of a single callback (progress bars). It targets the callbacks branch and the goal is to have the full callback implementation done in several smaller PRs merged in this branch before we merge it in main.

This PR proposes significant changes compared to #22000 that should imo improve a lot its change of getting merged 😄
The main improvement is that it no longer requires writing on the disk, but instead relies on multiprocessing.Managers and queues. It simplifies the code a lot.

In #22000 I adapted some estimators to work with the callbacks which I did not include here to keep the PR as light as possible. You can however experiment the callbacks on estimators that I wrote for testing purpose:

from sklearn.callback import ProgressBar
from sklearn.callback.tests._utils import Estimator, MetaEstimator
est = Estimator()
meta_est = MetaEstimator(est, n_jobs=2)
meta_est._set_callbacks(ProgressBar())
meta_est.fit(None, None)

You add sleep calls in these testing estimators to simulate how it changes when the computations take longer.

The plan is to then have several PRs to implement the other callbacks, adapt a few estimators to work with the callbacks, add some documentation and examples, add more tests...

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very shallow review.

sklearn/base.py Outdated
params_set = new_object.get_params(deep=False)

# attach callbacks to the new estimator
if hasattr(estimator, "_callbacks"):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also break quite easily if the callback object keeps references to attributes of the old estimator. Why aren't we creating a copy here?

Comment on lines +1341 to +1344
try:
return fit_method(estimator, *args, **kwargs)
finally:
estimator._eval_callbacks_on_fit_end()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is becoming larger than just a validation wrapper. We can simplify debugging and magic by having a BaseEstimator.fit which calls self._fit(...) and does all the common stuff before and after. That seems a lot better to understand and debug.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation when I introduced _fit_context was not only for validation but to have a generic context manager to handle everything we need to do before and after fit. That's why I gave it this generic name.

Although having a consistent framework where we'd have a BaseEstimator.fit and every child estimator implements _fit is appealing, I think it goes far beyond the scope of this PR and requires rewriting a lot of estimators.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw do you why BaseEstimator does not implement fit in the first place ?

Also, note that _fit_context also handles partial_fit, but I don't think we want BaseEstimator to implement partial_fit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseEstimator doesn't implement fit cause we don't generally have methods which raise NotImplementedError. They're simply not there. But now that we have all this work, we can certainly have it in BaseEstimator, and children only implement a __sklearn_fit__ kind of method instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's outside the scope of this PR. Using the existing context manager is just 1 line addition whereas implementing __sklearn_fit__ means countless PRs :)

@rth
Copy link
Copy Markdown
Member

rth commented Mar 5, 2024

+1 to merge this. I personally find that it would be more valuable to mark it as experimental (it's private anyway so far) let the users use this for a version or two, aggregate their feedback and iterate in future details if needed.

Overall the API sounds reasonable to me. Rather than to approve a SLEP on this, only then to realize that users would have preferred something else, or that there are some weird edge cases for some estimator that need special handling.

@jeremiedbb
Copy link
Copy Markdown
Member Author

Note that this PRis targeting the callbacks branch, not main so merging this is not a big commitment anyway 😄
And it would allow me to implement the rest that I did not include in this PR to keep it as small as possible.

I personally find that it would be more valuable to mark it as experimental

I agree and we already discussed that with @glemaitre. I plan to do that in a follow up PR.

@jeremiedbb
Copy link
Copy Markdown
Member Author

jeremiedbb commented Mar 5, 2024

I still need to figure out the issue with the CI though
EDIT: good now

@Scot-Survivor
Copy link
Copy Markdown

Any news on this being merged?

@glemaitre
Copy link
Copy Markdown
Member

Any news on this being merged?

We currently working on the design and need an agreement among core developers.

@ignaceHelsen
Copy link
Copy Markdown

Any news on this? It would be amazing would this be added.

@GaelVaroquaux
Copy link
Copy Markdown
Member

I think that the relevant PR is now #28760

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Jan 21, 2026

@jeremiedbb @FrancoisPgm shall we close this?

@jeremiedbb
Copy link
Copy Markdown
Member Author

I was just keeping it open to compare 2 alternatives but I guess this one is completely outdated now. Let's close

@jeremiedbb jeremiedbb closed this Jan 21, 2026
@github-project-automation github-project-automation bot moved this from Being dropped to Done in Callbacks Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants