Skip to content

[MRG + 2] Classifier and regressor tags#4418

Merged
amueller merged 3 commits intoscikit-learn:masterfrom
amueller:classifier_regressor_tags
Apr 2, 2015
Merged

[MRG + 2] Classifier and regressor tags#4418
amueller merged 3 commits intoscikit-learn:masterfrom
amueller:classifier_regressor_tags

Conversation

@amueller
Copy link
Copy Markdown
Member

This improved is_classifier by making it not depend on inheritance and introduces is_regressor.
It also fixes #2588, using ranking scorers on regressors.
Todo:

  • Describe the estimator_type in the dev docs
  • remove decision_function from all regressors.
  • use tag to decide whether to call predict or decision_function in multi_class.

@amueller
Copy link
Copy Markdown
Member Author

@mblondel for using ranking metrics on regressors, that is only possible with ground-truth being binary, right? Is that your use case?

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling 306d788 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@amueller amueller force-pushed the classifier_regressor_tags branch from 306d788 to bc10d8f Compare March 19, 2015 21:01
@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling bc10d8f on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling a39d237 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@mblondel
Copy link
Copy Markdown
Member

For ndcg (not yet in scikit learn) any real is fine.
On Mar 20, 2015 5:50 AM, "Andreas Mueller" notifications@github.com wrote:

@mblondel https://github.com/mblondel for using ranking metrics on
regressors, that is only possible with ground-truth being binary, right? Is
that your use case?


Reply to this email directly or view it on GitHub
#4418 (comment)
.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling ef712e2 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@amueller
Copy link
Copy Markdown
Member Author

Should be good now.

@amueller amueller changed the title [WIP] Classifier and regressor tags [MRG] Classifier and regressor tags Mar 20, 2015
@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling 6e50c77 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Mar 21, 2015

In which extend does it break backward compatibility with pickled model?

@amueller
Copy link
Copy Markdown
Member Author

If you unpickle a class that has a new property added, the unpickled object will have the class attribute. So is_classifier will work on "old" objects.

@mblondel
Copy link
Copy Markdown
Member

LGTM. Since it's a core design issue, it would be nice to have other opinions. random pings @larsmans @ogrisel @agramfort @jnothman

Thanks for tackling this @amueller!

@amueller amueller changed the title [MRG] Classifier and regressor tags [MRG + 1] Classifier and regressor tags Mar 24, 2015
@amueller
Copy link
Copy Markdown
Member Author

Thanks for the review :) I agree we should have more opinions, but I think it is a pretty clear-cut improvement.

@amueller
Copy link
Copy Markdown
Member Author

Argh, I need to write some docs though

sklearn/base.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should make this variable private: people not implementing an estimator should not be looking at it / modifying it. Only people coding a certain class should be touching it. This is pretty much the definition of an class-level private variable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the point of this tag system is to let third-party developers implement classes without inheriting from our base classes. Private variables can be changed. On the other hand, a third-party developer doesn't want his code to break because we changed variable names. We need to make this tag part of our API and commit to it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the point of this tag system is to let third-party developers implement
classes without inheriting from our base classes. Private variables can be
changed.

On the other hand, a third-party developer doesn't want his code to
break because we changed variable names. We need to make this tag part
of our API and commit to it.

Agreed, but it's going to make tab completion ugly, and it's going to
worry/confuse our users.

In a sense, this is the same thing as an add function, which is a
private variable. So in Python land, typing-related class information are
usually coded why private attributes, I would say.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the __ in add is just a way to emphasize that this is a special
method rather than a private one. If the goal is to not mess w/ tab
completion I am +1 with the _ prefix but the variable should really be
documented as part of our public API (i.e. we won't change it).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 03/24/2015 08:49 PM, Mathieu Blondel wrote:

I think the __ in add is just a way to emphasize that this is a
special method rather than a private one. If the goal is to not mess
w/ tab completion I am +1 with the _ prefix but the variable should
really be documented as part of our public API (i.e. we won't change it).
+1
I documented it in the dev docs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller
Copy link
Copy Markdown
Member Author

I added an explanation of the estimator_type attribute to the "Roll your own estimator" docs because this is when people need to know about it.
Is there a documentation of is_classifier somewhere? Should it also go to the developer docs? If so, where?

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.00% when pulling 06f3b75 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling 4db5b73 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.03%) to 95.11% when pulling 4db5b73 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@amueller
Copy link
Copy Markdown
Member Author

@landscape-bot doesn't like me accessing private attributes ^^

@amueller
Copy link
Copy Markdown
Member Author

I'm not sure where @mblondel's comment went. I agree the point is to make it accessible to third-party developers without inheritance. But I'm not sure why this means it can't be private.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distinction

@mblondel
Copy link
Copy Markdown
Member

Still +1 for merge on my side.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we enforce that all estimators should have a _estimator_type tag?

def is_classifier(estimator):
    """Returns True if the given estimator is a classifier."""
    if not hasattr(estimator, "_estimator_type"):
        raise ValueError("The given estimator instance does not have a _estimator_type tag.")

    return estimator._estimator_type.lower() == "classifier"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not work with user defined estimators currently, but this would be helpful in framing a generic estimator test framework as wished for in #3810

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then this should be in the test framework, not the code. So people that want to be strict can run their tests, but people that don't care can still run their sloppy but sklearn compatible code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense... Thanks for the comment :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling 2041386 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.03%) to 95.11% when pulling 2041386 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also staged_decision_function.

(Ping @pprett )

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that, but you are right, it should be removed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling c9a9f34 on amueller:classifier_regressor_tags into bc5acea on scikit-learn:master.

@mblondel
Copy link
Copy Markdown
Member

Shall we merge?

@amueller
Copy link
Copy Markdown
Member Author

I'll rebase (master changed). Maybe @ogrisel @GaelVaroquaux or @agramfort (that's what you get for commenting on issues) want to have a look.

@amueller amueller force-pushed the classifier_regressor_tags branch from c9a9f34 to d89c215 Compare April 1, 2015 00:00
@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling d89c215 on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X : array-like, shape (n_samples, n_features)

can you go over the docstrings to make sure we use this convention here too?

@agramfort
Copy link
Copy Markdown
Member

besides docstring nitpicks LGTM

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling 188fb11 on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

@amueller
Copy link
Copy Markdown
Member Author

amueller commented Apr 1, 2015

fixed the docstrings.

@amueller amueller changed the title [MRG + 1] Classifier and regressor tags [MRG + 2] Classifier and regressor tags Apr 1, 2015
@amueller
Copy link
Copy Markdown
Member Author

amueller commented Apr 2, 2015

ping @ogrisel @GaelVaroquaux maybe check if anyone at the sprint opposes, otherwise merge?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add 'clusterer' here, as I see it is used in the code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined it, but didn't use it. I can add it to the docs or remove it from the mixin, I wasn't sure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaelVaroquaux
Copy link
Copy Markdown
Member

Once the docs are updated to add clusterer, +1 to merge.

@amueller
Copy link
Copy Markdown
Member Author

amueller commented Apr 2, 2015

Done. Thanks for the reviews :) merging.

amueller added a commit that referenced this pull request Apr 2, 2015
[MRG + 2] Classifier and regressor tags
@amueller amueller merged commit aae305c into scikit-learn:master Apr 2, 2015
@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health decreased by 0.01% when pulling acb21bb on amueller:classifier_regressor_tags into e2dfd23 on scikit-learn:master.

@amueller amueller deleted the classifier_regressor_tags branch April 2, 2015 15:21
@GaelVaroquaux
Copy link
Copy Markdown
Member

GaelVaroquaux commented Apr 2, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make_scorer needs_threshold makes wrong assumptions

10 participants