FEAT add routing to AdaBoost's fit#24026
FEAT add routing to AdaBoost's fit#24026adrinjalali wants to merge 5 commits intoscikit-learn:sample-propsfrom
Conversation
| # we then remove sample_weight from the routed parameters since we will | ||
| # pass it explicitly to the fit method of the base_estimator. It was | ||
| # included above only to have `process_routing` to do the validation. | ||
| if "sample_weight" in routed_params.base_estimator.fit: | ||
| del routed_params.base_estimator.fit["sample_weight"] |
There was a problem hiding this comment.
I'm not sure which one's best here?
- we pass sample_weight to routing as is implemented here, and it would raise if the user hasn't requested sample weight for their base_estimator, even if they haven't provided sample_weight. While working with @glemaitre we thought this makes sense since AdaBoostClassifier always passes sample_weight to the base_estimator, which means the user should always request it explicitly.
- we could only pass sample_weight to routing if it's explicitly provided by the user. This would make users' code not change in most cases since they mostly pass no explicit sample_weight, and they pass a simple consumer estimator as the base_estimator. Their code would still break if they pass a router as a base_estimator and the sub-estimators of that router are not requesting sample_weight.
Which one do you think makes more sense?
There was a problem hiding this comment.
If we want to be extra lenient, I say the AdaBoost will set the request of the base_estimator if the sample_weight request is not set yet. Assuming base_estimator is not requesting sample_weight, we can be consistent behavior to main:
- If
base_estimatoris a classifier,AdaBoostClassifersets the metadata request to True. - If
base_estimatoris a regressor,AdaBoostClasifersets the meatadata request to False.
Edit: This suggestion does not work
There was a problem hiding this comment.
We can't do that since routers don't expose a method to set requests. So that would only work if the user passes a simple consumer estimator.
| if is_regressor(self): | ||
| routed_params = process_routing( | ||
| obj=self, method="fit", other_params=fit_params | ||
| ) |
There was a problem hiding this comment.
right now AdaBoostRegressor doesn't pass sample_weight to base_estimator even if the user explicitly passes them. Should this behavior be changed?
There was a problem hiding this comment.
If base_estimator is configured to require sample_weight, I think we pass it in. If it is not configured, we do not pass it in, which is consistent with the current behavior.
There was a problem hiding this comment.
Thinking it over, I think want to go with a more lenient approach: #24026 (comment)
There was a problem hiding this comment.
But currently AdaBoostRegressor never passes sample_weight to the base_estimator_.
There was a problem hiding this comment.
AdaBoostRegressor currently does not pass in sample_weight, but if base_estimator requested it I think sample_weight should be passed in.
If AdaBoostRegressor does not pass in sample_weight to base_estimator when requested, then the API seems inconsistent. A router can ignore metadata requested by the base_estimator.
| X_ = _safe_indexing(X, bootstrap_idx) | ||
| y_ = _safe_indexing(y, bootstrap_idx) | ||
| estimator.fit(X_, y_) | ||
| estimator.fit(X_, y_, **fit_params) |
There was a problem hiding this comment.
The current behavior is that the regressor never passes sample_weights to the base_estimator.
| if is_regressor(self): | ||
| routed_params = process_routing( | ||
| obj=self, method="fit", other_params=fit_params | ||
| ) |
There was a problem hiding this comment.
But currently AdaBoostRegressor never passes sample_weight to the base_estimator_.
| # we then remove sample_weight from the routed parameters since we will | ||
| # pass it explicitly to the fit method of the base_estimator. It was | ||
| # included above only to have `process_routing` to do the validation. | ||
| if "sample_weight" in routed_params.base_estimator.fit: | ||
| del routed_params.base_estimator.fit["sample_weight"] |
There was a problem hiding this comment.
We can't do that since routers don't expose a method to set requests. So that would only work if the user passes a simple consumer estimator.
|
What it is doing with relation to its base estimator and |
AdaBoost*ended up being a non-trivial one after-all.The classifier passes
sample_weightto thebase_estimator, and always needs to do that, so the requirement in this PR is that the user should set that request for thebase_estimatorif they're passing it.But the regressor doesn't pass weights and only uses it internally. So it's a consumer and never passes
sample_weightto thebase_estimatoreven if it's requested.This PR doesn't introduce
**paramstodecision_funcsion,predict, ... since that would really complicate things. I tried, and decided to leave that for a future release/PR.TODO: