[MRG + 1] Clarified error msg in plot_partial_dependence#7673
[MRG + 1] Clarified error msg in plot_partial_dependence#7673NelleV merged 3 commits intoscikit-learn:masterfrom
Conversation
| except IndexError: | ||
| raise ValueError('features[i] must be in [0, n_features) ' | ||
| 'but was %d' % i) | ||
| raise ValueError('features[i] must be in [0, len(feature_names)={0} ) ' |
There was a problem hiding this comment.
seems like there's an extra space after {0}
| raise ValueError('features[i] must be in [0, n_features) ' | ||
| 'but was %d' % i) | ||
| raise ValueError('features[i] must be in [0, len(feature_names)={0} ) ' | ||
| 'but was {1}'.format(len(feature_names), i)) |
There was a problem hiding this comment.
I find it a bit odd that you specify that "features[i] must be in a range", but then show the user the value of just "i" in the error message.
There was a problem hiding this comment.
I can't show features[i] in this case, since i is out of range of the array. However, i is the value from features[i] that caused the error because of the looping above.
It was a bit of a tricky issue here; perhaps the error message needs further revision. Just to make sure I understand correctly, features is an array of indices for the features whose names correspond to the elements of feature_names at features[i]. My goal here is to work out a way to show a message such that a user who hasn't fully read the documentation would understand it. Along those lines, showing the value for len(feature_names) is also a bit confusing, since both seem like they should contain at most the n_features elements (with the further restriction that features[i] is also bound to [0, n_features)). If a user didn't specify a name for every feature, then it seems this value would be wrong.
There was a problem hiding this comment.
ah I see. perhaps @tetraptych would have an opinion about this, since he had to go through the process of debugging it?
There was a problem hiding this comment.
There is a bit of history here that's missing, because I picked this up from an abandoned PR. The error message that I have here is, more or less, the same as that proposed in the previous PR, #7609 .
I think the original error message is (partially) correct, we just need to add some more description to it about what's going on here. It seems to me that there are 2 conditions that features[i] has to satisfy:
- 0 <=
features[i]<n_featuresbecause you can't enumerate more features than exist - 'features[i]' <
len(feature_names)becausefeatures[i]contains indices tofeature_names
There was a problem hiding this comment.
thanks for linking that to me. yeah, I agree that the error message is mostly there. It just seemed extremely ambiguous to me when I first read it (which made me question its value to a user), but it's obvious in hindsight after I've seen the code in question.
There was a problem hiding this comment.
It's slightly confusing that the i in features[i] is not i as far as I can tell.
There was a problem hiding this comment.
Why not features contains {} which is too large for feature_names? or something like that?
Maybe "All entries of features must be <len(feature_names)={}, got {}".format(len(feature_names), features[i])"
There was a problem hiding this comment.
Unless I'm missing something, after the features is converted by the prior code, the looping below causes i to be a value from the original features array that was passed in (if features containing integer values).
for fxs in features:
l = []
# explicit loop so "i" is bound for exception below
for i in fxs:
l.append(feature_names[i])
names.append(l)
except IndexError:
...
From a clarity standpoint, part of the issue may come from reassigning the features parameter that was passed in to tmp_features. After that, features[i] in the code takes a different meaning than what a user thinks of when reading features[i] in an error message.
There was a problem hiding this comment.
That's why my error message doesn't contain features[i] ;)
There was a problem hiding this comment.
Ahh sorry about that... I had a forum update issue and didn't see your last message : D. I'll push out another update later today.
|
Can you please fix the documentation for |
|
@amueller I updated the error message with your suggestion, but left off I updated the docs to cover all of the ways to specify Is there a place under the doc folder that should be updated as well? The partial dependence section in ensemble.rst didn't go into parameters for the specific plotting function, and I haven't found mention of |
|
Sorry it should have been |
|
Thanks, lgtm! |
|
Thanks for the patch! |
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
…n#7673) * Clarified error msg in plot_partial_dependence * Changed err msg for feature[i] out of range. Updated docs. * Error message shows invalid value.
Reference Issue
#7600
What does this implement/fix? Explain your changes.
Clarified error message in plot_partial_dependence when an
index greater than the number of features is specified in
featuresAny other comments?
This PR picks up on a previously abandoned PR, #7609 .