[MRG + 2] Classifier and regressor tags#4418
Conversation
|
@mblondel for using ranking metrics on regressors, that is only possible with ground-truth being binary, right? Is that your use case? |
|
|
306d788 to
bc10d8f
Compare
|
|
|
|
|
For ndcg (not yet in scikit learn) any real is fine.
|
|
|
|
Should be good now. |
|
|
|
In which extend does it break backward compatibility with pickled model? |
|
If you unpickle a class that has a new property added, the unpickled object will have the class attribute. So |
|
LGTM. Since it's a core design issue, it would be nice to have other opinions. random pings @larsmans @ogrisel @agramfort @jnothman Thanks for tackling this @amueller! |
|
Thanks for the review :) I agree we should have more opinions, but I think it is a pretty clear-cut improvement. |
|
Argh, I need to write some docs though |
sklearn/base.py
Outdated
There was a problem hiding this comment.
I think that we should make this variable private: people not implementing an estimator should not be looking at it / modifying it. Only people coding a certain class should be touching it. This is pretty much the definition of an class-level private variable.
There was a problem hiding this comment.
But the point of this tag system is to let third-party developers implement classes without inheriting from our base classes. Private variables can be changed. On the other hand, a third-party developer doesn't want his code to break because we changed variable names. We need to make this tag part of our API and commit to it.
There was a problem hiding this comment.
But the point of this tag system is to let third-party developers implement
classes without inheriting from our base classes. Private variables can be
changed.On the other hand, a third-party developer doesn't want his code to
break because we changed variable names. We need to make this tag part
of our API and commit to it.
Agreed, but it's going to make tab completion ugly, and it's going to
worry/confuse our users.
In a sense, this is the same thing as an add function, which is a
private variable. So in Python land, typing-related class information are
usually coded why private attributes, I would say.
There was a problem hiding this comment.
I think the __ in add is just a way to emphasize that this is a special
method rather than a private one. If the goal is to not mess w/ tab
completion I am +1 with the _ prefix but the variable should really be
documented as part of our public API (i.e. we won't change it).
There was a problem hiding this comment.
On 03/24/2015 08:49 PM, Mathieu Blondel wrote:
I think the __ in add is just a way to emphasize that this is a
special method rather than a private one. If the goal is to not mess
w/ tab completion I am +1 with the _ prefix but the variable should
really be documented as part of our public API (i.e. we won't change it).
+1
I documented it in the dev docs.
There was a problem hiding this comment.
|
I added an explanation of the |
|
|
|
|
|
@landscape-bot doesn't like me accessing private attributes ^^ |
|
I'm not sure where @mblondel's comment went. I agree the point is to make it accessible to third-party developers without inheritance. But I'm not sure why this means it can't be private. |
doc/developers/index.rst
Outdated
|
Still +1 for merge on my side. |
There was a problem hiding this comment.
Could we enforce that all estimators should have a _estimator_type tag?
def is_classifier(estimator):
"""Returns True if the given estimator is a classifier."""
if not hasattr(estimator, "_estimator_type"):
raise ValueError("The given estimator instance does not have a _estimator_type tag.")
return estimator._estimator_type.lower() == "classifier"There was a problem hiding this comment.
This would not work with user defined estimators currently, but this would be helpful in framing a generic estimator test framework as wished for in #3810
There was a problem hiding this comment.
But then this should be in the test framework, not the code. So people that want to be strict can run their tests, but people that don't care can still run their sloppy but sklearn compatible code.
There was a problem hiding this comment.
That makes sense... Thanks for the comment :)
There was a problem hiding this comment.
|
|
There was a problem hiding this comment.
There is also staged_decision_function.
(Ping @pprett )
There was a problem hiding this comment.
I was wondering about that, but you are right, it should be removed.
|
|
|
Shall we merge? |
|
I'll rebase (master changed). Maybe @ogrisel @GaelVaroquaux or @agramfort (that's what you get for commenting on issues) want to have a look. |
c9a9f34 to
d89c215
Compare
|
|
sklearn/svm/base.py
Outdated
There was a problem hiding this comment.
X : array-like, shape (n_samples, n_features)
can you go over the docstrings to make sure we use this convention here too?
|
besides docstring nitpicks LGTM |
|
|
|
fixed the docstrings. |
|
ping @ogrisel @GaelVaroquaux maybe check if anyone at the sprint opposes, otherwise merge? |
doc/developers/index.rst
Outdated
There was a problem hiding this comment.
You should add 'clusterer' here, as I see it is used in the code.
There was a problem hiding this comment.
I defined it, but didn't use it. I can add it to the docs or remove it from the mixin, I wasn't sure.
There was a problem hiding this comment.
|
Once the docs are updated to add clusterer, +1 to merge. |
|
Done. Thanks for the reviews :) merging. |
[MRG + 2] Classifier and regressor tags
|
|
|
Merged #4418.
Hurray! Thanks
|
This improved
is_classifierby making it not depend on inheritance and introducesis_regressor.It also fixes #2588, using ranking scorers on regressors.
Todo:
estimator_typein the dev docsdecision_functionfrom all regressors.