[NOMRG] new warmstart API for GBDTs by NicolasHug · Pull Request #15105 · scikit-learn/scikit-learn

NicolasHug · 2019-09-27T22:19:26Z

Ping @adrinjalali is this what was decided during the sprint?

(CI will break because I'm not catching the deprecation warnings)

adrinjalali

Left a few notes as I was reading. Yeah this is what we had in mind IIRC (from the estimator's perspective).

The implementation in the pipeline would be trickier.

sklearn/base.py

adrinjalali · 2019-10-04T10:35:21Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py


+        # For backward compat (for now)
+        if self.warm_start:
+            warnings.warn("warm_start parameter is deprecated", DeprecationWarning)


also mention how the user can fix it/what they should do instead.

Sure this is just some sort of proof of concept.

In the specific case of the hist GBDT, we don't need to deprecate anything since the warm-start hasn't even been released yet :)

NicolasHug · 2019-10-04T14:18:59Z

The implementation in the pipeline would be trickier.

Are you sure? for the fit_param, we could just do e.g. pipe.fit(X, y, warm_start_with={'hist_gbdt__max_iter': 100}

and the _warmstartable_parameters attribute could just be a property that we would set depending on the last step? (This is assuming that only the last step of the pipeline can be warm-started which I guess is reasonable)

adrinjalali · 2019-10-04T16:22:03Z

I was thinking of allowing warm start, or refitting the pipeline, having changed a parameter anywhere, and only refit the steps after the changed one. So in a sense, pipeline can warm start with almost any of its parameters

NicolasHug · 2019-10-04T17:11:59Z

You mean, when doing pipe.set_params({'second_step__something': 12}) then not fitting the first step again when calling pip.fit()?

I feel like this is yet another kind of warm-starting, mostly orthogonal to the kind of warm-starting that this API is concerned with?

adrinjalali · 2019-10-21T14:50:37Z

So for now, we forget about warm start on the pipeline for this PR, and we should test for metaestimators such as pipeline and Voting estimators.

…rmstart_gbdt_gridsearch

NicolasHug · 2019-10-22T18:49:28Z

Are you happy with how it looks so far @adrinjalali ? (test failure is unrelated)

I think we should write a SLEP for this.

adrinjalali

Once we decide on the API, I think we should put the warm startable params higher in the meta-estimator hierarchy (I think), but otherwise looks good.

And yeah we should write a slep for the new API.

adrinjalali · 2019-10-23T14:06:46Z

sklearn/pipeline.py

+    def _warmstartable_parameters(self):
+        # This property exposes the _warmstartable_parameters attribute, e.g.
+        # ['+last_step_name__param']
+        # We consider that only the last step can be warm-started. The first


Do we need to make this design choice? I may have asked this question before, but don't remember your logic behind it.

A pipeline is either:

a sequence of transformers

a sequence of transformers + a predictor as the last step

It's safe to assume that transformers are not warm-startable. Maybe in the future one of them will be?? We can worry about that when that happens.

You could have a word embedding transformer as a step, which very often is warm started. We may not have that inside sklearn, but the pipeline's API should support it, I think.

I agree we should leave the possibility open, and make the code future-proof. This is one of the reasons why _warmstartable_parameters is a list.

But I don't think we should implement support for that, we have no use-case ATM.

Concretely, supporting warm-start for transformers right now is writing code that isn't used (that would require updating the _fit method that fits all the transformers)

I see. Yeah fair.

adrinjalali · 2019-10-24T14:59:35Z

@jnothman I'd appreciate your thoughts on this one.

NicolasHug · 2020-02-23T15:12:59Z

The benefits of this approach aren't clear to me, and we have enough SLEPs to deal with ATM. Closing, might re-open one day

NicolasHug added 3 commits September 27, 2019 18:13

WIP

e8f45c5

micro change

9eb8dd0

Merge branch 'master' into warmstart_gbdt_gridsearch

d7cb8be

NicolasHug changed the title ~~[NOMRG] new warmstart API + grid search for GBDTs~~ [NOMRG] new warmstart API for GBDTs Oct 3, 2019

NicolasHug mentioned this pull request Oct 3, 2019

Notes on warmstarting GridSearchCV and SuccessiveHalving #15125

Open

adrinjalali reviewed Oct 4, 2019

View reviewed changes

minor comment

50f5214

NicolasHug added 8 commits October 21, 2019 13:26

Merge branch 'master' of github.com:scikit-learn/scikit-learn into wa…

134d4f1

…rmstart_gbdt_gridsearch

Don't raise deprecation warnings in tests

7ea6e9e

Added support for pipelines

c00cab5

uncomment test

fdccca8

pep8

e84ecbc

Merge branch 'master' of github.com:scikit-learn/scikit-learn into wa…

f605a4d

…rmstart_gbdt_gridsearch

doc for CI

1f7167d

comments

b77fe73

adrinjalali reviewed Oct 23, 2019

View reviewed changes

NicolasHug mentioned this pull request Dec 12, 2019

Scalar fit_params no longer handled. Was: Singleton array (insert value here) cannot be considered a valid collection. #15805

Closed

NicolasHug closed this Feb 23, 2020

NicolasHug mentioned this pull request Jan 19, 2021

SLEP needed: warm starting with Pipelines, and safer warm starting scikit-learn/enhancement_proposals#51

Open

lorentzenchr mentioned this pull request Nov 18, 2025

MNT remove _safe_indexing from api_reference.py #32728

Open

Uh oh!

Conversation

NicolasHug commented Sep 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adrinjalali Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Oct 4, 2019

Uh oh!

adrinjalali commented Oct 4, 2019

Uh oh!

NicolasHug commented Oct 4, 2019

Uh oh!

adrinjalali commented Oct 21, 2019

Uh oh!

NicolasHug commented Oct 22, 2019

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Oct 24, 2019

Uh oh!

NicolasHug commented Feb 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasHug commented Sep 27, 2019 •

edited

Loading