[MRG+2] TransformedTargetRegressor#9041
Conversation
|
@jnothman @amueller @dengemann @agramfort Can you have a look before that I am writing some narrative doc accordingly |
jnothman
left a comment
There was a problem hiding this comment.
Despite the multitude of comments, overall I think this is what we want. Good work
sklearn/preprocessing/target.py
Outdated
|
|
||
|
|
||
| class TransformedTargetRegressor(BaseEstimator, RegressorMixin): | ||
| """Meta-estimator to apply a transformation to the target before fitting. |
There was a problem hiding this comment.
"regress on a transformed target" might be simpler than "apply a ..."
sklearn/preprocessing/target.py
Outdated
|
|
||
| Parameters | ||
| ---------- | ||
| estimator : object, (default=LinearRegression()) |
There was a problem hiding this comment.
maybe should call this regressor, because transformer is also an estimator.
sklearn/preprocessing/target.py
Outdated
| Parameters | ||
| ---------- | ||
| estimator : object, (default=LinearRegression()) | ||
| Estimator object derived from ``RegressorMixin``. |
There was a problem hiding this comment.
we don't usually require inheritance, as long as appropriate methods are available. You could say "such as derived from" ...
Perhaps mention that it will be cloned.
sklearn/preprocessing/target.py
Outdated
| Estimator object derived from ``RegressorMixin``. | ||
|
|
||
| transformer : object, (default=None) | ||
| Estimator object derived from ``TransformerMixin``. Cannot be set at |
There was a problem hiding this comment.
We don't usually require inheritance.
Perhaps mention that it will be cloned.
sklearn/preprocessing/target.py
Outdated
| ``func`` and ``inverse_func`` are ``None`` as well, the transformer | ||
| will be an identity transformer. | ||
|
|
||
| func : function, (default=None) |
There was a problem hiding this comment.
I'd prefer "optional" to "default=None" which has no clear semantics.
There was a problem hiding this comment.
Though in this case the semantics of "optional" are not obvious either; the reader needs to look 2 lines below anyway.
There was a problem hiding this comment.
I hardly see that as a problem, given the context.
There was a problem hiding this comment.
Just sayin', if this says "optional" instead of "default=None", the docstring below should say "if not passed" instead of "If None". (or the reader needs to scroll and check that the default value is None)
sklearn/preprocessing/target.py
Outdated
| self._validate_transformer(y_2d) | ||
| self.estimator_ = clone(self.estimator) | ||
| self.estimator_.fit(X, self.transformer_.transform(y_2d), | ||
| sample_weight=sample_weight) |
There was a problem hiding this comment.
I think the current convention is to pass sample_weight only when sample_weight is not None
sklearn/preprocessing/target.py
Outdated
| return pred | ||
|
|
||
| def score(self, X, y, sample_weight=None): | ||
| """Returns the coefficient of determination R^2 of the prediction. |
There was a problem hiding this comment.
Should state here that scoring is performed in the original, not the transformed, space.
| from sklearn import datasets | ||
|
|
||
| iris = datasets.load_iris() | ||
| friedman = datasets.make_friedman1(random_state=0) |
| assert_array_almost_equal((y - y_mean) / y_std, y_tran) | ||
| assert_array_almost_equal(y, np.ravel(clf.transformer_.inverse_transform( | ||
| y_tran.reshape(-1, 1)))) | ||
| assert_equal(y.shape, pred.shape) |
There was a problem hiding this comment.
You've failed to test that clf.estimator_ was passed the transformed y.
A better test would just check the equivalence between clf.estimator_.coef_ and LinearRegression().fit(X, StandardScaler().fit_transform(y[:, None])[:, 0]).coef_.
You've also not tested the handling of sample_weight.
There was a problem hiding this comment.
yeah this would be a good test, too.
I think testing coef_ and also testing pred would be good.
sklearn/preprocessing/target.py
Outdated
| # memorize if y should be a multi-output | ||
| self.y_ndim_ = y.ndim | ||
| if y.ndim == 1: | ||
| y_2d = y.reshape(-1, 1) |
There was a problem hiding this comment.
I suspect we don't want to do this when func and inverse_func are provided?
There was a problem hiding this comment.
I come back on this point. I am not sure this is a great idea since that it changes the behaviour between if you pass a function or a transformer. We could still make the transform to 2d and build the FunctionTransformer with validate=True and the behaviour will always be the same.
I don't see a case in which the user would define a function which working on a 1D array but would failed on a 2D array
|
As long as the value of having inverse_func is clear in the docs, I don't
mind
…On 8 Jun 2017 9:13 pm, "Guillaume Lemaitre" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/preprocessing/target.py
<#9041 (comment)>
:
> + ----------
+ estimator : object, (default=LinearRegression())
+ Estimator object derived from ``RegressorMixin``.
+
+ transformer : object, (default=None)
+ Estimator object derived from ``TransformerMixin``. Cannot be set at
+ the same time as ``func`` and ``inverse_func``. If ``None`` and
+ ``func`` and ``inverse_func`` are ``None`` as well, the transformer
+ will be an identity transformer.
+
+ func : function, (default=None)
+ Function to apply to ``y`` before passing to ``fit``. Cannot be set at
+ the same time than ``transformer``. If ``None`` and ``transformer`` is
+ ``None`` as well, the function used will be the identity function.
+
+ inverse_func : function, (default=None)
Since None will lead to the identity function and that we don't enforce
func and inverse_func to be be actually the inverse of each other, I am
not sure that inverse_func should be required.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9041 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66KUL8Dsk0gJ8GD5sNskIppWzVzvks5sB9dggaJpZM4NywA7>
.
|
|
@jnothman I miss the what's new but I added some doc and address almost all comments. I am just unsure about |
sklearn/preprocessing/target.py
Outdated
| self._fit_transformer(y_2d, sample_weight) | ||
| self.regressor_ = clone(self.regressor) | ||
| if sample_weight is not None: | ||
| self.regressor_.fit(X, self.transformer_.transform(y_2d), |
There was a problem hiding this comment.
I mean that we should really be using fit_transform to produce the downstream y here
|
fit_transform is mostly used for efficiency
…On 9 Jun 2017 10:14 am, "Guillaume Lemaitre" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I miss the what's new but I added
some doc and address almost all comments. I am just unsure about fit +
transform vs fit_transform and there implications.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9041 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_rPW2YLTp-Io7H1I2ev5xFXULsmks5sCI5kgaJpZM4NywA7>
.
|
doc/whats_new.rst
Outdated
| ............ | ||
|
|
||
| - Added the :class:`sklearn.preprocessing.TransformedTargetRegressor` which | ||
| is a meta-estimator to regress on a modified ``y``. :issue:`9041` by |
There was a problem hiding this comment.
just to make this more relatable, describe a small use case, maybe
"for example, to perform regression in log-space"
or something?
|
How about "andydoesntcryhimselftosleepatnightanymorebecausehefinallyhasthetoolshewants"? Or to generic? |
Then this PR will be invaded by a pink unicorn :) |
|
If glue is too generic, how about sklearn.adhesive? haha...
You nailed my problem with "glue". Glue is not something that I use to
connect tubes.
How about "tubes", "connectors".
|
jnothman
left a comment
There was a problem hiding this comment.
A couple of small things to test. And the naming/placement questions stand. Apart from which LGTM.
| >>> def inverse_func(x): | ||
| ... return x | ||
| >>> regr = TransformedTargetRegressor(regressor=regressor, | ||
| ... func=func, |
| # non-negative and (ii) applying an exponential function to obtain non-linear | ||
| # targets which cannot be fitted using a simple linear model. | ||
| # | ||
| # Therefore, a logarithmic and an exponential functions will be used to |
|
|
||
| regr_trans = TransformedTargetRegressor( | ||
| regressor=RidgeCV(), | ||
| transformer=QuantileTransformer(output_distribution='normal')) |
sklearn/preprocessing/target.py
Outdated
| >>> from sklearn.linear_model import LinearRegression | ||
| >>> from sklearn.preprocessing import TransformedTargetRegressor | ||
| >>> tt = TransformedTargetRegressor(regressor=LinearRegression(), | ||
| ... func=np.log, inverse_func=np.exp) |
sklearn/preprocessing/target.py
Outdated
| ----- | ||
| Internally, the target ``y`` is always converted into a 2-dimensional array | ||
| to be used by scikit-learn transformers. At the time of prediction, the | ||
| output will be reshaped to a have the same number of dimension as ``y``. |
| regr = TransformedTargetRegressor(regressor=LinearRegression(), | ||
| func=np.sqrt, inverse_func=np.log, | ||
| check_inverse=False) | ||
| # the transformer/functions are not checked to be invertible the fitting |
There was a problem hiding this comment.
I'd drop this comment, but would make it clearer by replacing the previous statement with regr.set_params(check_inverse=False)
| y_tran = regr.transformer_.transform(y) | ||
| assert_allclose(np.log(y), y_tran) | ||
| assert_allclose(y, regr.transformer_.inverse_transform(y_tran)) | ||
| assert_equal(y.shape, y_pred.shape) |
There was a problem hiding this comment.
with pytest, we can just use a bare assert y.shape == y_pred.shape, which I find much more legible.
| assert_allclose(regr.regressor_.coef_, lr.coef_) | ||
|
|
||
|
|
||
| def test_transform_target_regressor_1d_transformer_multioutput(): |
There was a problem hiding this comment.
It is hard to see how similar or different the code is here from the previous test. Perhaps use a loop or or a check function or pytest.mark.parametrize
| assert_allclose(regr.regressor_.coef_, lr.coef_) | ||
|
|
||
|
|
||
| def test_transform_target_regressor_2d_transformer_multioutput(): |
There was a problem hiding this comment.
same here that it's hard to see how similar or different the tests are.
| # check that the target ``y`` passed to the transformer will always be a | ||
| # numpy array | ||
| X, y = friedman | ||
| tt = TransformedTargetRegressor(transformer=DummyTransformer(), |
There was a problem hiding this comment.
Can you please check similarly that the predictor receives X as a list? Thanks.
|
Done |
|
the real world example is not very convincing... though I'm not sure there's a better one with the data we have.... |
|
It's your +1 in the title, I think, or at least it's not mine... |
|
I think this is also just waiting on where to put in... |
|
The +1 is from Andy during the scikit learn sprint :D
|
|
I think it's fine where it is ;) |
|
We can always move before the release, I think delaying features for module naming bike-shedding will get us in trouble... |
|
I am ok with having it in preprocessing in the interest of moving
forward.
|
|
Let's do it! Thanks, @glemaitre. |
|
One less hack for my class! This is moving forward quite nicely lol. (PowerTransformer was another). Can we do KNN imputation, missing value features and ColumnTransformer next? Oh and blanced random forests (though actually imblearn has it now :)? I think then I'm good... just need to implement a decent time series library in python, or something... |
Add a new meta-regressor which transforms y for training.
| - Added :class:`multioutput.RegressorChain` for multi-target | ||
| regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`. | ||
|
|
||
| - Added the :class:`preprocessing.TransformedTargetRegressor` which transforms |
There was a problem hiding this comment.
For some reason this had disappeared from what's new and I've just reinserted it :\
Continutation of #8988
TODO: