Conversation
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
adrinjalali
left a comment
There was a problem hiding this comment.
Glad to see this tested :)
fairlearn/metrics/_metrics_engine.py
Outdated
| result.overall = metric_function(y_a, y_p, **kwargs) | ||
|
|
||
| groups = np.unique(group_membership) | ||
| result.by_group = dict() |
There was a problem hiding this comment.
Bunch is a dict, and it seems the only thing we're putting in it here is another dict inside by_group. Doesn't it then make sense to put those results directly into result instead of under by_group?
There was a problem hiding this comment.
That would be a problem if someone decides to have an 'overall' entry in their group_membership vector. Not sure why someone would, but that's either something else to check, or a charming pitfall for some user.
There was a problem hiding this comment.
But we can check for that, can't we?
There was a problem hiding this comment.
Yes - as I said it's something else to check. I just think that it would be a bit odd to restrict the keys a user could use in their data. Especially as they most likely won't notice that bit of the documentation, but will instead end up with a sudden exception.
There was a problem hiding this comment.
I had a same thought as @adrinjalali . I think we should try to make the result shallow. Let's say that the two groups are "male" and "female". Then we could use keys like:
"group_male", "group_female", "overall", "min", "max", "range", "range_ratio"
There was a problem hiding this comment.
I've updated this again. Please take a look at test_metrics_engine.py and let me know if you like the new syntax. I would like to have this agreed before going through and changing all of the other tests.
There was a problem hiding this comment.
I was discussing this with @rihorn2 at lunch, and I like this even less. Going to this level of change will require changes to the dashboard too. I do not see the benefits.
There was a problem hiding this comment.
it'd be nice if people who are not there in your office could see the result of your discussions in a better way :)
There was a problem hiding this comment.
The result of the discussion: what is the benefit of doing this ('this' being any of this change)?
There was a problem hiding this comment.
the hope was to better support the use cases outlined in: https://github.com/fairlearn/fairlearn/issues/257#issuecomment-580811906
and also remove frictions that come about when you introduce new data types:
https://github.com/fairlearn/fairlearn/issues/257#issuecomment-581414697
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Not all required test changes in place yet Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
Signed-off-by: Richard Edgar <riedgar@microsoft.com>
| _OVERALL = 'overall' | ||
| _BY_GROUP_FORMAT = 'group_{0}' | ||
| _MIN = 'min' | ||
| _MAX = 'max' | ||
| _RANGE = 'range' | ||
| _RANGE_RATIO = 'range_ratio' | ||
| _ARGMIN = 'argmin' | ||
| _ARGMAX = 'argmax' |
There was a problem hiding this comment.
I personally find this less readable than having those literals directly in the code. Especially when they're used pretty much only once.
| result[_RANGE] = result_range | ||
| result[_RANGE_RATIO] = range_ratio |
There was a problem hiding this comment.
how often are these two used? If not often, the user could compute them from the other values.
There was a problem hiding this comment.
I would actually be happy to see all of these extra fields eliminated. Since there's no constraint on changing the other members, they could easily be inconsistent. That's why I changed GroupMetricResult to computing them dynamically.
| # Compute all the statistics, taking care not to stomp on our dictionary | ||
| try: | ||
| minimum = min(result.values()) | ||
| argmin = [k for k, v in result.items() if v == minimum] |
There was a problem hiding this comment.
use numpy.argmin and numpy.argmax instead?
There was a problem hiding this comment.
I don't believe those return the keys of a dict, but indices into an array?
There was a problem hiding this comment.
right, sorry, here min(result, key=result.get) would work I believe.
There was a problem hiding this comment.
That appears to return only one of the minima.
| result[_RANGE] = result_range | ||
| result[_RANGE_RATIO] = range_ratio | ||
| except ValueError: | ||
| # Nothing to do if the result type is not amenable to 'min' etc. |
There was a problem hiding this comment.
When the metric is something like a confusion matrix, for which min() is not defined
There was a problem hiding this comment.
I guess in that case, it should also be tested?
There was a problem hiding this comment.
The test_matrix_metric test perhaps?
| assert exception_context.value.args[0] == expected | ||
|
|
||
|
|
||
| class TestConsistencyCheck: |
There was a problem hiding this comment.
curious, why are the tests in classes?
There was a problem hiding this comment.
Can't remember offhand why these ones are.
There was a problem hiding this comment.
they may be remnants of nose or some other testing tool. pytest doesn't need them.
There was a problem hiding this comment.
We've always been pytest, but haven't been consistent in whether we use test classes or not. It might be down to the fact that the _metrics_engine module had two routines, so I had a class for each of them. That would be a remnant from before we reorganised the namespaces.
| assert result['overall'] == -16 | ||
| assert result['group_0'] == -10 | ||
| assert result['group_1'] == -6 | ||
| assert result['min'] == -10 | ||
| assert result['max'] == -6 | ||
| assert result['range'] == 4 | ||
| assert np.isnan(result['range_ratio']) |
There was a problem hiding this comment.
if the result is a Bunch, you could still access them as result.min etc.
There was a problem hiding this comment.
I decided against using that.
There was a problem hiding this comment.
also, for some of these you could keep the old key name, to keep it more backward compatible.
There was a problem hiding this comment.
I decided against using that.
care to elaborate why?
There was a problem hiding this comment.
Thinking it over, I wasn't keen on a dependency on undocumented behaviour.
There was a problem hiding this comment.
But we talked about it and the conclusion was to document it. That's why I've created scikit-learn/scikit-learn#16404
Also, if it's supposed to not be used, you could use a dict instead of a Bunch.
|
Per discussions on #257 this is not needed right now |
Sample PR to remove the
GroupMetricResulttype and replace it with aBunch. This does not yet fix the notebooks