Skip to content

[MRG] faster, flatter precision_recall_fscore_support#2278

Merged
ogrisel merged 8 commits intoscikit-learn:masterfrom
jnothman:prf_rewrite3
Sep 11, 2013
Merged

[MRG] faster, flatter precision_recall_fscore_support#2278
ogrisel merged 8 commits intoscikit-learn:masterfrom
jnothman:prf_rewrite3

Conversation

@jnothman
Copy link
Copy Markdown
Member

  • Rewritten for efficiency and neatness. Note the handling of multilabel-sequences format by binarizing is in response to benchmarks. Under the simple benchmarks at https://gist.github.com/jnothman/5734967, this new version is about 1.3x faster than master (for multiclass and label indicators; about 2x faster for sequence of sequences).
  • Also fixed some warning handling in tests
  • Also reverted the recent reimplementation of mathew's correlation coefficient, using LabelEncoder to handle non-int labels instead.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Thanks @jnothman :-)

Can you rebase on top of master?

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Sure -- the only conflict was the module's lengthy author list and imports.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, I let example-based instead of sample-based

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed. and I've missed it multiple times since!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks !

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Why have you removed the improvements of the mcc metric?

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Awesome refarctoring !!!

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Why have you removed the improvements of the mcc metric?

Uhh... apparently a git-user fail. I think I rebased an old version of my branch. I might need to go searching for the right one.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Thank you Travis for recording my previous branch head (b81a68f6296753c52f45b0ebbc0b8b466f84b871).

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

re-rebased with your comments (except for moving the elif case) addressed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignore_warnings decorator I introduce in #2334 would be better applied here, but I don't really have the time to fix up either PR atm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine !

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Travis is not happy :-/

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Travis is not happy :-/

Then nor am I. It's actually a worry, because the error is in matthews_corrcoef, unchanged since https://travis-ci.org/scikit-learn/scikit-learn/builds/9551448, which suggests a nasty interaction with the changes since in prf, e.g. use of np.errstate. Any ideas?

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

I am able to reproduce the error on my laptop.
By using np.errstate, the error is solved:

    with np.errstate(divide='ignore', invalid='ignore'):
        mcc = np.corrcoef(y_true, y_pred)[0, 1]

Strange, the use of seterrstate and geterrstate seems to have some side effects.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Is that correct behaviour, or is the warning meant to happen on nan?

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Given this test

def test_matthews_corrcoef_nan():
    with warnings.catch_warnings():
        warnings.simplefilter("always")
        assert_equal(matthews_corrcoef([0], [1]), 0.0)
        warnings.simplefilter("error")
        assert_equal(matthews_corrcoef([0,0],[0,1]), 0.0)

I would say that we should raise a warning.

So in the end, this should be,

    with np.errstate(divide='warn', invalid='warn'):
        mcc = np.corrcoef(y_true, y_pred)[0, 1]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to indicate here that 'modifier' and 'average' are strings used for the warning. It is hard to guess from the name of the function and the argument names.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done so.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

Apparently Travis doesn't like that suggestion either. The plot thickens. I'm open to suggestions.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

Strange, I though this would work. Should we ignore the warning and launch our own when the denominator is 0?

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Aug 1, 2013

I think you misinterpreted the test code: it means the test should fail if
a warning is produced. I'm not sure why this error appears here and not in
an earlier build, but I think this commit should fix it.

On Thu, Aug 1, 2013 at 10:33 PM, Arnaud Joly notifications@github.comwrote:

Strange, I though this would work. Should we ignore the warning and launch
our own when the denominator is 0?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2278#issuecomment-21933001
.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 1, 2013

LGTM 👍, thanks a lot this is a lot better than the previous version!

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 13, 2013

A last review?

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Aug 21, 2013

(Ping some people randomly @ogrisel@glouppe, @amueller)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an inline comment to explain when the invalid errstate could be triggered and why it's safe to ignore it in the specific case of matthews_corrcoef computation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about it. This simply reverts the code to an implementation before it was adapted to handle string input, with a LabelEncoder thrown on top.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright then.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Aug 21, 2013

Apart from my warning class comment, the code looks simple enough and I trust the test suite and your bench results. +1 for merging.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't understand how this format string could possibly work. Could you please add a test that catches this specific warning and check the actual message content?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. The tricky bit is the {{}} which format expands to {} to be substituted with "due to" or "in labels with" or "in samples with".

You're telling me I shouldn't look for jobs in NLG? :p

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I got it after writing the comment and re-reading the end of the function. Still an explicit test that checks the message would be very welcomed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment about adding a check for the message content has not been addressed, or has it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it has been in 0ea0aedef72306260097abff6ed786dedae1794b. I wonder how I missed it...

@jnothman
Copy link
Copy Markdown
Member Author

Rebased on master, and addressed @arjoly's comment on label_binarize and @ogrisel's comments on the warning class and testing its message.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 10, 2013

The travis failure looks real:

======================================================================
ERROR: sklearn.metrics.tests.test_metrics.test_prf_warnings
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/tests/test_metrics.py", line 1786, in test_prf_warnings
    assert_equal(str(record.pop().message),
IndexError: pop from empty list
----------------------------------------------------------------------

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 10, 2013

I cannot reproduce the test failure on my box either. The assert_warns statements from the previous test do not fail though. Maybe assert_warns could be extended to add an optional expected message option.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use numpy or python warning instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one? It's better to have specific warning classes to allow the user to silence specific warning categories.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending of the situation, we can use either divide by zero warning or invalid.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, we could inherit from one of those.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for keeping a specific class, optionally inheriting from the one that makes most sense.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to consider inheriting from RuntimeWarning rather than UserWarning, since it is intended for "dubious runtime behaviour". Yet we use UserWarnings everywhere else in sklearn, despite the docstring "Base class for warnings generated by user code."

The numpy warnings aren't relevant, IMO. This isn't some arbitrary numerical issue, it's about implementation details of metrics where the input isn't within the metric's natural domain, but the function must return some value, and there are multiple options as to what value.

So, what do you suggest?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong opinion. I want to avoid adding a new warning class if one of the standard library does the job.

I'm happy to consider inheriting from RuntimeWarning rather than UserWarning, since it is intended for "dubious runtime behaviour". Yet we use UserWarnings everywhere else in sklearn, despite the docstring "Base class for warnings generated by user code."

RuntimeWarning looks appropriate (Base class for warnings about dubious runtime behavior.).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid adding a new warning class if one of the standard library does the job.

In terms of warnings, I understand "does the job" as: where a user might want to silence a particular warning or turn it into an exception, it should be easy, unambiguous and version-invariant.

This means it should have a specific class to distinguish it from other warnings in the module if that seems appropriate. @ogrisel suggested it might be in this case. Backwards-compatibility would have me extend from UserWarning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I think we can keep UserWarning for now. If we ever devise an official inheritance tree for sklearn warnings we might want to revise that choice later.

@jnothman
Copy link
Copy Markdown
Member Author

The travis failure must be yet another unexpected warning interaction. I'll try to sleuth it out.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 10, 2013

The travis failure must be yet another unexpected warning interaction. I'll try to sleuth it out.

Maybe duplicated warnings inside the same warning.catch block are collected only once under certain circumstances?

@jnothman
Copy link
Copy Markdown
Member Author

No, that's not the issue. The issue is that the register of past warnings for its 'once' operation to work is buggy, though I've lost the Python bug tracker id for now.

I've traced the present problem down to the test_score_objects module since this causes the failure:

nosetests sklearn/metrics/tests/test_score_objects.py sklearn.metrics.tests.test_metrics:test_prf_warnings

@jnothman
Copy link
Copy Markdown
Member Author

found that and another one in the classification_report doctest. Let's see if Travis prefers this.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 10, 2013

Great, well done @jnothman!

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 10, 2013

Once @arjoly's comment is addressed, +1 for merging.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a format syntax?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I must!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nitpick. Do as you wish.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Sep 11, 2013

LGTM +1 !

ogrisel added a commit that referenced this pull request Sep 11, 2013
[MRG] faster, flatter precision_recall_fscore_support
@ogrisel ogrisel merged commit 513b0f4 into scikit-learn:master Sep 11, 2013
@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Sep 11, 2013

Merged! Thanks @jnothman

@jnothman
Copy link
Copy Markdown
Member Author

Good to have this out the door! Thanks @arjoly and @ogrisel!

@jnothman jnothman deleted the prf_rewrite3 branch September 11, 2013 09:23
@arjoly
Copy link
Copy Markdown
Member

arjoly commented Sep 11, 2013

Thanks @jnothman !!! 👍 🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants