Skip to content

Remove GroupMetricResult#287

Closed
riedgar-ms wants to merge 8 commits intomasterfrom
riedgar-ms/remove-group-metric-result
Closed

Remove GroupMetricResult#287
riedgar-ms wants to merge 8 commits intomasterfrom
riedgar-ms/remove-group-metric-result

Conversation

@riedgar-ms
Copy link
Copy Markdown
Member

Sample PR to remove the GroupMetricResult type and replace it with a Bunch. This does not yet fix the notebooks

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this tested :)

result.overall = metric_function(y_a, y_p, **kwargs)

groups = np.unique(group_membership)
result.by_group = dict()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch is a dict, and it seems the only thing we're putting in it here is another dict inside by_group. Doesn't it then make sense to put those results directly into result instead of under by_group?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a problem if someone decides to have an 'overall' entry in their group_membership vector. Not sure why someone would, but that's either something else to check, or a charming pitfall for some user.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can check for that, can't we?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - as I said it's something else to check. I just think that it would be a bit odd to restrict the keys a user could use in their data. Especially as they most likely won't notice that bit of the documentation, but will instead end up with a sudden exception.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a same thought as @adrinjalali . I think we should try to make the result shallow. Let's say that the two groups are "male" and "female". Then we could use keys like:
"group_male", "group_female", "overall", "min", "max", "range", "range_ratio"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this again. Please take a look at test_metrics_engine.py and let me know if you like the new syntax. I would like to have this agreed before going through and changing all of the other tests.

@MiroDudik @adrinjalali

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was discussing this with @rihorn2 at lunch, and I like this even less. Going to this level of change will require changes to the dashboard too. I do not see the benefits.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice if people who are not there in your office could see the result of your discussions in a better way :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of the discussion: what is the benefit of doing this ('this' being any of this change)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hope was to better support the use cases outlined in: https://github.com/fairlearn/fairlearn/issues/257#issuecomment-580811906
and also remove frictions that come about when you introduce new data types:
https://github.com/fairlearn/fairlearn/issues/257#issuecomment-581414697

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Not all required test changes in place yet

Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @riedgar-ms

Comment on lines +8 to +15
_OVERALL = 'overall'
_BY_GROUP_FORMAT = 'group_{0}'
_MIN = 'min'
_MAX = 'max'
_RANGE = 'range'
_RANGE_RATIO = 'range_ratio'
_ARGMIN = 'argmin'
_ARGMAX = 'argmax'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find this less readable than having those literals directly in the code. Especially when they're used pretty much only once.

Comment on lines +97 to +98
result[_RANGE] = result_range
result[_RANGE_RATIO] = range_ratio
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often are these two used? If not often, the user could compute them from the other values.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually be happy to see all of these extra fields eliminated. Since there's no constraint on changing the other members, they could easily be inconsistent. That's why I changed GroupMetricResult to computing them dynamically.

# Compute all the statistics, taking care not to stomp on our dictionary
try:
minimum = min(result.values())
argmin = [k for k, v in result.items() if v == minimum]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use numpy.argmin and numpy.argmax instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe those return the keys of a dict, but indices into an array?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, sorry, here min(result, key=result.get) would work I believe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That appears to return only one of the minima.

result[_RANGE] = result_range
result[_RANGE_RATIO] = range_ratio
except ValueError:
# Nothing to do if the result type is not amenable to 'min' etc.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this happen?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the metric is something like a confusion matrix, for which min() is not defined

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in that case, it should also be tested?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_matrix_metric test perhaps?

assert exception_context.value.args[0] == expected


class TestConsistencyCheck:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why are the tests in classes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember offhand why these ones are.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they may be remnants of nose or some other testing tool. pytest doesn't need them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've always been pytest, but haven't been consistent in whether we use test classes or not. It might be down to the fact that the _metrics_engine module had two routines, so I had a class for each of them. That would be a remnant from before we reorganised the namespaces.

Comment on lines +203 to +209
assert result['overall'] == -16
assert result['group_0'] == -10
assert result['group_1'] == -6
assert result['min'] == -10
assert result['max'] == -6
assert result['range'] == 4
assert np.isnan(result['range_ratio'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the result is a Bunch, you could still access them as result.min etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against using that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, for some of these you could keep the old key name, to keep it more backward compatible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided against using that.

care to elaborate why?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking it over, I wasn't keen on a dependency on undocumented behaviour.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we talked about it and the conclusion was to document it. That's why I've created scikit-learn/scikit-learn#16404

Also, if it's supposed to not be used, you could use a dict instead of a Bunch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using a dict.

@riedgar-ms
Copy link
Copy Markdown
Member Author

Per discussions on #257 this is not needed right now

@riedgar-ms riedgar-ms closed this Feb 11, 2020
@riedgar-ms riedgar-ms deleted the riedgar-ms/remove-group-metric-result branch March 19, 2020 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants