Add averaging option to AMI and NMI by aryamccarthy · Pull Request #11124 · scikit-learn/scikit-learn

aryamccarthy · 2018-05-23T22:13:29Z

Reference Issues/PRs

See #10308; this is a first step toward eventually deprecating one behavior and making their behavior consistent.

What does this implement/fix?

Background: The measures AMI, NMI, and V-measure are intimately related. Each is a normalized version of mutual information, and AMI incorporates adjustment for chance.

AMI, NMI, and V-Measure use different strategies for normalizing: the arithmetic mean, geometric mean, and max (i.e. infinity-norm) of the two clusterings' entropies. (V-measure is NMI with arithmetic mean.)
This makes the measures difficult to directly compare.
Added switch for NMI and AMI to allow choice of normalization
Long-term plan: unify behavior. Warning about future deprecation.

Leave current behavior unchanged

amueller

Needs tests, otherwise great!

amueller · 2018-05-24T01:04:44Z

sklearn/metrics/cluster/supervised.py

    information) and 1 (perfect correlation). In this function, mutual
-    information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``.
+    information is normalized by some generalized mean of ``H(labels_true)``
+    and ``H(labels_pred))``.


maybe either mention the default here or say "as defined by average_method".

amueller · 2018-05-24T01:04:54Z

sklearn/metrics/cluster/supervised.py

+        How to compute the normalizer in the denominator. Possible options
+        are 'min', 'sqrt', 'sum', and 'max'.
+        If None, 'sqrt' will be used, matching the behavior of
+        `v_measure_score`. 


single backticks do nothing.

Add ..versionadded

Unsure what ..versionadded is.

http://www.sphinx-doc.org/en/stable/markup/para.html#directive-versionadded

do git grep versionadded to see how we use it.

Ah, so which version would that be?

amueller · 2018-05-24T01:15:20Z

test failures are flake8. you should run flake8 in your editor.

amueller · 2018-05-24T03:19:26Z

oh this is related I just saw #8645

amueller · 2018-05-24T15:23:57Z

sklearn/metrics/cluster/supervised.py

    """
+    if average_method is None:
+        warnings.warn("The behavior of AMI will change in a future version. "
+                      "To match the behavior of 'v_measure_score', AMI will "


There's a separate warning in the NMI function; perhaps I should mention both functions in each warning? V-measure is the hardest to change since it's computed differently, and we have to change 2 of the 3.

Sorry, I thought we changed only one. Isn't V-measure using sqrt?

Surprisingly enough, no. Check test_v_measure_and_mutual_information.

Ok, I was going by #10308 (comment) and memory. But so after this NMI and v_measure_score will be identical, right? Are they identical in all cases? Then we should either add it as an alias or possibly deprecate and just mention in the docs.

They are identical in all cases—the formulas are equivalent. Let's not alias due to homogeneity_completeness_v_measure. If you're computing all three, the last is simple to draw from the first two; otherwise, the computation is wasteful. (Then again, to my knowledge V-measure never caught on. People picked up too quickly on the fact that it's just repackaging a measure proposed >20 years beforehand. Do we even need to keep it?)

We should at least make it very clear in the doc. We can consider deprecating it, maybe open an issue for discussion. Can you add a note about the identity in the docstring of v-measure and NMI and the user guide of v-measure and nmi and homogeneity_completeness_v_measure?

Added in latest commit.

amueller · 2018-05-24T15:24:27Z

sklearn/metrics/cluster/supervised.py

    h_true, h_pred = entropy(labels_true), entropy(labels_pred)
-    ami = (mi - emi) / (max(h_true, h_pred) - emi)
+    normalizer = _generalized_average(h_true, h_pred, average_method)
+    print(normalizer)


Hah, whoops!

* Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI

aryamccarthy · 2018-05-25T15:18:06Z

Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ?

jnothman

Not so fast ;)

jnothman · 2018-05-26T21:30:38Z

doc/modules/clustering.rst

 calculated using a similar form to that of the adjusted Rand index:

-.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]}
+.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}


The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another

jnothman · 2018-05-26T21:31:22Z

sklearn/metrics/cluster/tests/test_supervised.py

+    c, d = 12, 12
+    means = [_generalized_average(c, d, method) for method in methods]
+    assert_equal(means[0], means[1])
+    assert_equal(means[1], means[2])


We no longer use nosetests. Use bare assert instead of such helper functions in all new code

jnothman · 2018-05-26T21:34:04Z

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

* Update v0.20.rst * Update test_supervised.py * Update clustering.rst

aryamccarthy · 2018-05-27T04:57:14Z

Mission accomplished :)

jnothman

I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.

sqrt and sum don't make sense as names of means: call them geometric and arithmetic

aryamccarthy · 2018-05-27T23:38:53Z

Right—I wanted to stick to the notation started by Vinh et al and used elsewhere, but the names are awful. I can switch them.

…

-am

On May 27, 2018, 6:46 PM -0400, Joel Nothman ***@***.***>, wrote: @jnothman commented on this pull request. I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V. sqrt and sum don't make sense as names of means: call them geometric and arithmetic — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jnothman · 2018-05-28T06:14:50Z

You can provide a mapping from reasonable names to Vinh in the docs. Thanks

aryamccarthy · 2018-06-05T18:31:36Z

Checking again—is this ready to pull?

jnothman

Not yet looked at tests

jnothman · 2018-06-06T03:19:20Z

doc/whats_new/v0.20.rst

+- Added control over the normalizer in 
+  :func:`metrics.normalized_mutual_information_score` and
+  :func:`metrics.adjusted_mutual_information_score` via the ``average_method``
+  parameter. In a future version, the default normalizer for each will become


Let's call this future version 0.22, and put this changing default value in to the API Changes Summary

The changing default value should be explained in the API Changes Summary for 0.22, right? I don't see that file yet.

There should be a section on deprecations for 0.20 in the API changes Summary.

What should the text be here? Maybe "In metrics.normalized_mutual_information_score and
metrics.adjusted_mutual_information_score, warn that average_method
will have a new default value. In version 0.22, the default normalizer for each
will become the arithmetic mean of the entropies of each clustering."

Say "In ... added the parameter average_method ... The default averaging method will change from (whatever it is in each) to arithmetic in 0.22``. I think?

jnothman · 2018-06-06T03:20:41Z

sklearn/metrics/cluster/supervised.py

+    if average_method == "min":
+        return max(min(U, V), 1e-10)
+    elif average_method == "geometric":
+        return max(np.sqrt(U * V), 1e-10)  # Avoids zero-division error


... but you'll land up with near-infinite measures anyway?? Maybe we're better off warning the user.

perhaps use eps defined as np.finfo('float64').eps instead of this arbitrary value.

It's also awkward having this comment here and not the first time this eps is used.

I think the normalizer = max(normalizer, eps) should happen in NMI, actually, not here. It's not directly applicable in AMI which should check for normalizer - emi == 0.

I think in the case that the denominator is 0, we should be issuing at least a warning, whether or not we then return a finite (backwards compatible in NMI) or infinite value.

jnothman · 2018-06-06T03:23:31Z

sklearn/metrics/cluster/supervised.py


    """
+    if average_method is None:
+        warnings.warn("The behavior of NMI will change in a future version. "


jnothman · 2018-06-06T03:24:27Z

doc/modules/clustering.rst

+.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}
+
+For normalized mutual information and adjusted mutual information, the normalizing 
+value is typically some mean of the entropies of each clustering. Various means exist,


Note that it is controlled by average_method in our implementatin

I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?

Right—It's some generalized mean. The comment slipped through the cracks; I'll address it with the next update.

Actually, it didn't slip through. It's addressed at the end of that paragraph.

jnothman · 2018-06-06T03:24:44Z

sklearn/metrics/cluster/supervised.py

+    if average_method is None:
+        warnings.warn("The behavior of NMI will change in a future version. "
+                      "To match the behavior of 'v_measure_score', NMI will "
+                      "use arithmetic mean by default."


State "average_method='arithmetic' by default"

jnothman · 2018-06-06T03:27:28Z

sklearn/metrics/cluster/supervised.py

+    elif average_method == "arithmetic":
+        return max(np.mean([U, V]), 1e-10)
+    elif average_method == "max":
+        return max(U, V)


This could still be 0 in theory

jnothman · 2018-06-06T23:44:15Z

sklearn/metrics/cluster/supervised.py

+    average_method : string or None, optional (default: None)
+        How to compute the normalizer in the denominator. Possible options
+        are 'min', 'geometric', 'arithmetic', and 'max'.
+        If None, 'max' will be used. This is likely to change in a future


Don't say likely to. Say it will change to arithmetic in 0.22

jnothman · 2018-06-06T23:44:22Z

sklearn/metrics/cluster/supervised.py

+    average_method : string or None, optional (default: None)
+        How to compute the normalizer in the denominator. Possible options
+        are 'min', 'geometric', 'arithmetic', and 'max'.
+        If None, 'geometric' will be used. This is likely to change in a


aryamccarthy · 2018-06-09T08:11:01Z

@jnothman or @amueller, is this ready?

jnothman

Yes, I think this looks good now.

aryamccarthy · 2018-06-09T13:57:43Z

Ah, then pull away! ;)

jnothman · 2018-06-09T14:30:12Z

We require two approvals before merge (sorry!)

Hopefully @amueller can give this another glance.

aryamccarthy · 2018-06-26T01:52:43Z

@amueller?

jnothman · 2018-06-26T02:33:35Z

might be better to have someone else review. @qinhanmin2014 knows metrics, so maybe...

aryamccarthy · 2018-07-03T21:03:39Z

Bump @qinhanmin2014

aryamccarthy · 2018-07-11T16:15:10Z

Bump @qinhanmin2014 @jnothman

jnothman · 2018-07-12T00:07:06Z

This might get some attention with the sprints next week, but probably not from Hanmin.

qinhanmin2014 · 2018-07-12T01:26:03Z

Apologies for the delay and thanks @aryamccarthy for your great work.
I'll mark this as 0.20 to help you attract reviewers. For me, I can only promise to give a review after the release.

amueller

There's probably lots of deprecation warnings in the tests now. Can you please either catch them or explicitly pass the new parameter? (do we have a standard procedure for this btw? Is it documented?)

amueller · 2018-07-12T16:56:25Z

doc/modules/clustering.rst

+.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}
+
+For normalized mutual information and adjusted mutual information, the normalizing 
+value is typically some mean of the entropies of each clustering. Various means exist,


I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?

amueller · 2018-07-12T16:56:50Z

doc/modules/clustering.rst

+value is typically some mean of the entropies of each clustering. Various means exist,
+and no firm rules exist for preferring one over the others.  The decision is largely 
+a field-by-field basis; for instance, in community detection, the arithmetic mean is
+most common. Yang et al. (2016) found that each normalizing method provided 


link to reference for citation?

amueller · 2018-07-12T16:57:03Z

doc/modules/clustering.rst

+"qualitatively similar behaviours". In our implementation, this is
+controlled by the ``average_method`` parameter.
+
+Vinh et al. (2010) named variants of NMI and AMI by their averaging method. Their


Link to reference for citation?

amueller · 2018-07-12T16:57:54Z

doc/modules/clustering.rst


 The V-measure is actually equivalent to the mutual information (NMI)
-discussed above normalized by the sum of the label entropies [B2011]_.
+discussed above normalized by the arithmetic mean of the label 


maybe say "when normalized" or "with the aggregation function being the arithmetic mean"?

amueller · 2018-07-12T16:58:22Z

doc/whats_new/v0.20.rst

 - Partial AUC is available via ``max_fpr`` parameter in
  :func:`metrics.roc_auc_score`. :issue:`3273` by
  :user:`Alexander Niederbühl <Alexander-N>`.
+- Added control over the normalizer in 


add blank line

amueller · 2018-07-12T16:58:46Z

doc/whats_new/v0.20.rst

  for :func:`metrics.roc_auc_score`. Moreover using ``reorder=True`` can hide bugs
  due to floating point error in the input.
  :issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`.
+- In :func:`metrics.normalized_mutual_information_score` and


amueller · 2018-07-12T17:01:52Z

sklearn/metrics/cluster/tests/test_supervised.py

+        normalized_mutual_info_score,
+        adjusted_mutual_info_score,
+    ]
+    means = {"min", "geometric", "arithmetic", "max"}


feel kinda weird about calling these means.

I can switch to generalized_means if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.

amueller · 2018-07-15T19:29:25Z

did you check that you're catching all deprecation warnings?

aryamccarthy · 2018-07-15T19:41:43Z

With this?

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=PendingDeprecationWarning)

Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing.

amueller · 2018-07-15T20:20:13Z

can you merge master to fix the conflict?

And I think we're using

      with ignore_warnings(category=DeprecationWarning):

right now.

amueller · 2018-07-16T19:31:32Z

fixed the conflict

aryamccarthy · 2018-07-16T19:56:15Z

For my future knowledge, what's the command to do that?

amueller · 2018-07-16T20:16:14Z

I merged master into it and fixed the merge conflict and pushed into your branch.
So assuming you're on your branch and have the main repo as upstream remote

git pull upstream master
# fix merge conflict, commit
git push origin master

fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch.

amueller · 2018-07-16T20:24:17Z

can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter

aryamccarthy · 2018-07-16T21:32:41Z

I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire test_supervised.py file with with ignore_warnings(category=FutureWarning):.

massich · 2018-07-17T15:08:15Z

LGTM

+1 to merge

GaelVaroquaux · 2018-07-17T15:13:30Z

LGTM. Merging.

jnothman · 2018-07-18T07:34:16Z

Thanks Arya!

Add averaging option to AMI and NMI

80844ae

Leave current behavior unchanged

amueller reviewed May 24, 2018

View reviewed changes

aryamccarthy added 2 commits May 24, 2018 11:20

Flake8 fixes

4794486

Incorporate tests of means for AMI and NMI

6279c25

amueller reviewed May 24, 2018

View reviewed changes

aryamccarthy added 2 commits May 24, 2018 11:33

Add note about average_method in NMI

ed500d6

Update docs from AMI, NMI changes (#1)

5ed8527

* Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI

jnothman reviewed May 26, 2018

View reviewed changes

aryamccarthy added 2 commits May 26, 2018 21:59

Update documentation and remove nose tests (#2)

df60d46

* Update v0.20.rst * Update test_supervised.py * Update clustering.rst

Fix multiple spaces after operator

b449cb9

jnothman reviewed May 27, 2018

View reviewed changes

Rename all arguments

1b36da5

jnothman reviewed Jun 6, 2018

View reviewed changes

aryamccarthy added 2 commits June 6, 2018 13:20

No more arbitrary values!

3d8bf2c

Improve handling of floating-point imprecision

2854014

jnothman reviewed Jun 6, 2018

View reviewed changes

Clearly state when the change occurs

059bae6

jnothman approved these changes Jun 9, 2018

View reviewed changes

qinhanmin2014 added this to the 0.20 milestone Jul 12, 2018

qinhanmin2014 added the Waiting for Reviewer label Jul 12, 2018

amueller approved these changes Jul 12, 2018

View reviewed changes

Update AMI/NMI docs

e8b9579

amueller added the Blocker label Jul 16, 2018

Merge branch 'master' into pr/11124

afe9776

Update v0.20.rst

c65d2b3

Catch FutureWarnings in AMI and NMI

a5b3c0f

GaelVaroquaux merged commit 52b6a66 into scikit-learn:master Jul 17, 2018

qinhanmin2014 mentioned this pull request Aug 1, 2018

NMI and AMI use inconsistent definitions of mutual information #10308

Closed

This was referenced Sep 27, 2018

NMI and AMI are not explained well in the user guide #8645

Closed

[MRG] fix for NMI and AMI are not explained well in the user guide #8645 #8912

Closed

Uh oh!

Conversation

aryamccarthy commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented May 24, 2018

Uh oh!

amueller commented May 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aryamccarthy commented May 25, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented May 26, 2018

Uh oh!

aryamccarthy commented May 27, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

aryamccarthy commented May 27, 2018 via email

Uh oh!

jnothman commented May 28, 2018 via email

Uh oh!

aryamccarthy commented Jun 5, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aryamccarthy Jun 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aryamccarthy commented May 23, 2018 •

edited

Loading

aryamccarthy Jun 6, 2018 •

edited

Loading