Add averaging option to AMI and NMI#11124
Add averaging option to AMI and NMI#11124GaelVaroquaux merged 15 commits intoscikit-learn:masterfrom
Conversation
Leave current behavior unchanged
amueller
left a comment
There was a problem hiding this comment.
Needs tests, otherwise great!
| information) and 1 (perfect correlation). In this function, mutual | ||
| information is normalized by ``sqrt(H(labels_true) * H(labels_pred))``. | ||
| information is normalized by some generalized mean of ``H(labels_true)`` | ||
| and ``H(labels_pred))``. |
There was a problem hiding this comment.
maybe either mention the default here or say "as defined by average_method".
| How to compute the normalizer in the denominator. Possible options | ||
| are 'min', 'sqrt', 'sum', and 'max'. | ||
| If None, 'sqrt' will be used, matching the behavior of | ||
| `v_measure_score`. |
There was a problem hiding this comment.
Unsure what ..versionadded is.
There was a problem hiding this comment.
http://www.sphinx-doc.org/en/stable/markup/para.html#directive-versionadded
do git grep versionadded to see how we use it.
There was a problem hiding this comment.
Ah, so which version would that be?
|
test failures are flake8. you should run flake8 in your editor. |
|
oh this is related I just saw #8645 |
| """ | ||
| if average_method is None: | ||
| warnings.warn("The behavior of AMI will change in a future version. " | ||
| "To match the behavior of 'v_measure_score', AMI will " |
There was a problem hiding this comment.
There's a separate warning in the NMI function; perhaps I should mention both functions in each warning? V-measure is the hardest to change since it's computed differently, and we have to change 2 of the 3.
There was a problem hiding this comment.
Sorry, I thought we changed only one. Isn't V-measure using sqrt?
There was a problem hiding this comment.
Surprisingly enough, no. Check test_v_measure_and_mutual_information.
There was a problem hiding this comment.
Ok, I was going by #10308 (comment) and memory. But so after this NMI and v_measure_score will be identical, right? Are they identical in all cases? Then we should either add it as an alias or possibly deprecate and just mention in the docs.
There was a problem hiding this comment.
They are identical in all cases—the formulas are equivalent. Let's not alias due to homogeneity_completeness_v_measure. If you're computing all three, the last is simple to draw from the first two; otherwise, the computation is wasteful. (Then again, to my knowledge V-measure never caught on. People picked up too quickly on the fact that it's just repackaging a measure proposed >20 years beforehand. Do we even need to keep it?)
There was a problem hiding this comment.
We should at least make it very clear in the doc. We can consider deprecating it, maybe open an issue for discussion. Can you add a note about the identity in the docstring of v-measure and NMI and the user guide of v-measure and nmi and homogeneity_completeness_v_measure?
There was a problem hiding this comment.
Added in latest commit.
| h_true, h_pred = entropy(labels_true), entropy(labels_pred) | ||
| ami = (mi - emi) / (max(h_true, h_pred) - emi) | ||
| normalizer = _generalized_average(h_true, h_pred, average_method) | ||
| print(normalizer) |
* Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI
|
Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ? |
| calculated using a similar form to that of the adjusted Rand index: | ||
|
|
||
| .. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]} | ||
| .. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} |
There was a problem hiding this comment.
The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another
| c, d = 12, 12 | ||
| means = [_generalized_average(c, d, method) for method in methods] | ||
| assert_equal(means[0], means[1]) | ||
| assert_equal(means[1], means[2]) |
There was a problem hiding this comment.
We no longer use nosetests. Use bare assert instead of such helper functions in all new code
|
Please add an entry to the change log at |
* Update v0.20.rst * Update test_supervised.py * Update clustering.rst
|
Mission accomplished :) |
jnothman
left a comment
There was a problem hiding this comment.
I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.
sqrt and sum don't make sense as names of means: call them geometric and arithmetic
|
Right—I wanted to stick to the notation started by Vinh et al and used elsewhere, but the names are awful. I can switch them.
…-am
On May 27, 2018, 6:46 PM -0400, Joel Nothman ***@***.***>, wrote:
@jnothman commented on this pull request.
I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.
sqrt and sum don't make sense as names of means: call them geometric and arithmetic
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
You can provide a mapping from reasonable names to Vinh in the docs. Thanks
|
|
Checking again—is this ready to pull? |
doc/whats_new/v0.20.rst
Outdated
| - Added control over the normalizer in | ||
| :func:`metrics.normalized_mutual_information_score` and | ||
| :func:`metrics.adjusted_mutual_information_score` via the ``average_method`` | ||
| parameter. In a future version, the default normalizer for each will become |
There was a problem hiding this comment.
Let's call this future version 0.22, and put this changing default value in to the API Changes Summary
There was a problem hiding this comment.
The changing default value should be explained in the API Changes Summary for 0.22, right? I don't see that file yet.
There was a problem hiding this comment.
There should be a section on deprecations for 0.20 in the API changes Summary.
There was a problem hiding this comment.
What should the text be here? Maybe "In metrics.normalized_mutual_information_score and
metrics.adjusted_mutual_information_score, warn that average_method
will have a new default value. In version 0.22, the default normalizer for each
will become the arithmetic mean of the entropies of each clustering."
There was a problem hiding this comment.
Say "In ... added the parameter average_method ... The default averaging method will change from (whatever it is in each) to arithmetic in 0.22``. I think?
| if average_method == "min": | ||
| return max(min(U, V), 1e-10) | ||
| elif average_method == "geometric": | ||
| return max(np.sqrt(U * V), 1e-10) # Avoids zero-division error |
There was a problem hiding this comment.
... but you'll land up with near-infinite measures anyway?? Maybe we're better off warning the user.
There was a problem hiding this comment.
perhaps use eps defined as np.finfo('float64').eps instead of this arbitrary value.
It's also awkward having this comment here and not the first time this eps is used.
There was a problem hiding this comment.
I think the normalizer = max(normalizer, eps) should happen in NMI, actually, not here. It's not directly applicable in AMI which should check for normalizer - emi == 0.
I think in the case that the denominator is 0, we should be issuing at least a warning, whether or not we then return a finite (backwards compatible in NMI) or infinite value.
|
|
||
| """ | ||
| if average_method is None: | ||
| warnings.warn("The behavior of NMI will change in a future version. " |
doc/modules/clustering.rst
Outdated
| .. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} | ||
|
|
||
| For normalized mutual information and adjusted mutual information, the normalizing | ||
| value is typically some mean of the entropies of each clustering. Various means exist, |
There was a problem hiding this comment.
Note that it is controlled by average_method in our implementatin
There was a problem hiding this comment.
I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?
There was a problem hiding this comment.
Right—It's some generalized mean. The comment slipped through the cracks; I'll address it with the next update.
There was a problem hiding this comment.
Actually, it didn't slip through. It's addressed at the end of that paragraph.
| if average_method is None: | ||
| warnings.warn("The behavior of NMI will change in a future version. " | ||
| "To match the behavior of 'v_measure_score', NMI will " | ||
| "use arithmetic mean by default." |
There was a problem hiding this comment.
State "average_method='arithmetic' by default"
| elif average_method == "arithmetic": | ||
| return max(np.mean([U, V]), 1e-10) | ||
| elif average_method == "max": | ||
| return max(U, V) |
There was a problem hiding this comment.
This could still be 0 in theory
| average_method : string or None, optional (default: None) | ||
| How to compute the normalizer in the denominator. Possible options | ||
| are 'min', 'geometric', 'arithmetic', and 'max'. | ||
| If None, 'max' will be used. This is likely to change in a future |
There was a problem hiding this comment.
Don't say likely to. Say it will change to arithmetic in 0.22
| average_method : string or None, optional (default: None) | ||
| How to compute the normalizer in the denominator. Possible options | ||
| are 'min', 'geometric', 'arithmetic', and 'max'. | ||
| If None, 'geometric' will be used. This is likely to change in a |
jnothman
left a comment
There was a problem hiding this comment.
Yes, I think this looks good now.
|
Ah, then pull away! ;) |
|
We require two approvals before merge (sorry!) Hopefully @amueller can give this another glance. |
|
might be better to have someone else review. @qinhanmin2014 knows metrics,
so maybe...
|
|
Bump @qinhanmin2014 |
|
Bump @qinhanmin2014 @jnothman |
|
This might get some attention with the sprints next week, but probably not
from Hanmin.
|
|
Apologies for the delay and thanks @aryamccarthy for your great work. |
amueller
left a comment
There was a problem hiding this comment.
There's probably lots of deprecation warnings in the tests now. Can you please either catch them or explicitly pass the new parameter? (do we have a standard procedure for this btw? Is it documented?)
doc/modules/clustering.rst
Outdated
| .. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]} | ||
|
|
||
| For normalized mutual information and adjusted mutual information, the normalizing | ||
| value is typically some mean of the entropies of each clustering. Various means exist, |
There was a problem hiding this comment.
I'm not sure if I like "some mean" because in probability theory there is only one mean. Maybe "some aggregate"? Also @jnothman's comment is not addressed, right?
doc/modules/clustering.rst
Outdated
| value is typically some mean of the entropies of each clustering. Various means exist, | ||
| and no firm rules exist for preferring one over the others. The decision is largely | ||
| a field-by-field basis; for instance, in community detection, the arithmetic mean is | ||
| most common. Yang et al. (2016) found that each normalizing method provided |
There was a problem hiding this comment.
link to reference for citation?
doc/modules/clustering.rst
Outdated
| "qualitatively similar behaviours". In our implementation, this is | ||
| controlled by the ``average_method`` parameter. | ||
|
|
||
| Vinh et al. (2010) named variants of NMI and AMI by their averaging method. Their |
There was a problem hiding this comment.
Link to reference for citation?
doc/modules/clustering.rst
Outdated
|
|
||
| The V-measure is actually equivalent to the mutual information (NMI) | ||
| discussed above normalized by the sum of the label entropies [B2011]_. | ||
| discussed above normalized by the arithmetic mean of the label |
There was a problem hiding this comment.
maybe say "when normalized" or "with the aggregation function being the arithmetic mean"?
doc/whats_new/v0.20.rst
Outdated
| - Partial AUC is available via ``max_fpr`` parameter in | ||
| :func:`metrics.roc_auc_score`. :issue:`3273` by | ||
| :user:`Alexander Niederbühl <Alexander-N>`. | ||
| - Added control over the normalizer in |
| for :func:`metrics.roc_auc_score`. Moreover using ``reorder=True`` can hide bugs | ||
| due to floating point error in the input. | ||
| :issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`. | ||
| - In :func:`metrics.normalized_mutual_information_score` and |
| normalized_mutual_info_score, | ||
| adjusted_mutual_info_score, | ||
| ] | ||
| means = {"min", "geometric", "arithmetic", "max"} |
There was a problem hiding this comment.
feel kinda weird about calling these means.
There was a problem hiding this comment.
I can switch to generalized_means if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.
|
did you check that you're catching all deprecation warnings? |
|
With this? with warnings.catch_warnings():
warnings.filterwarnings("ignore",category=PendingDeprecationWarning)Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing. |
|
can you merge master to fix the conflict? And I think we're using right now. |
|
fixed the conflict |
|
For my future knowledge, what's the command to do that? |
|
I merged master into it and fixed the merge conflict and pushed into your branch. fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch. |
|
can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter |
|
I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire |
|
LGTM +1 to merge |
|
LGTM. Merging. |
|
Thanks Arya!
|
Reference Issues/PRs
See #10308; this is a first step toward eventually deprecating one behavior and making their behavior consistent.
What does this implement/fix?
Background: The measures AMI, NMI, and V-measure are intimately related. Each is a normalized version of mutual information, and AMI incorporates adjustment for chance.