[MRG] MAINT Use set litterals when possible#12667
[MRG] MAINT Use set litterals when possible#12667qinhanmin2014 merged 8 commits intoscikit-learn:masterfrom
Conversation
sklearn/tests/test_multiclass.py
Outdated
| assert_equal(set(clf.classes_), classes) | ||
| y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | ||
| assert_equal(set(y_pred), set("eggs")) | ||
| assert_equal(set(y_pred), {"eggs"}) |
There was a problem hiding this comment.
I have no idea how that passed -- it shouldn't have,
>>> set("eggs")
{'g', 'e', 's'}
>>> set(['eggs'])
{'eggs'}Curious to see this fix will fail..
There was a problem hiding this comment.
Ok, the test was passing because we were creating a set(<string>) (and creating {'g', 'e', 's'}) on each side.
qinhanmin2014
left a comment
There was a problem hiding this comment.
+1 since it's faster, though I don't think it's more readable.
sklearn/tests/test_multiclass.py
Outdated
| assert_equal(set(clf.classes_), classes) | ||
| y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | ||
| assert_equal(set(y_pred), set("eggs")) | ||
| assert_equal(set([y_pred]), {"eggs"}) |
There was a problem hiding this comment.
What's happening here? Why two different solutions for two sets? Maybe {y_pred}?
jnothman
left a comment
There was a problem hiding this comment.
Since the dict syntax is much more common, I tend to find set literals only easy to read if they are either very short ({1,2,3}) or if there is another visual hint that : is absent, such as putting in newlines between long set items or before for in comprehensions. But that's a personal taste.
I don't think the speed-up is sufficient justification alone
|
Yeah, this one is a bit controversial. See related discussion at scipy scipy/scipy#9531 Not really sure why I started this -- stumbled on an issue about it at numpy while looking for something else... OK, will revert set comprehensions that don't include a new line. |
|
I'll vote +0 (maybe -1) to modify part of the our repo. Seems that readability is indeed a problem (not only for some python beginners like me), so maybe close this one? |
|
Well, {"binary", "multiclass"}is still the right way to define sets as opposed to, set(["binary", "multiclass"])Also, that's the mathematical notation for sets https://en.wikipedia.org/wiki/Set_(mathematics)#Describing_sets . Just the fact that dicts are more widespread doesn't change that. Using a notation close to the math notation in scientific code is good IMO. Set comprehension is just a generalization of that notation, again close to math notations https://en.wikipedia.org/wiki/Set_notation#Metaphor_in_denoting_sets For the set comprehension, I kind of agree that it's less readable, but again it's just due to the fact that sets are little as compared to dicts. Dict is a generalization of sets to two elements in terms of notation, not the other way around. Will revert set comprehension. |
I don't think I understand that statement. For small sets I find set literals more readable and was surprised to find the |
|
Removed set comprehension, and left only the most simple cases. CI is green (apart for the failing test on master). |
sklearn/tests/test_multiclass.py
Outdated
| y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | ||
| assert_equal(set(y_pred), set("eggs")) | ||
| y_pred = clf.predict(np.array([[0, 0, 4]])) | ||
| assert_equal(set(y_pred), {"eggs"}) |
There was a problem hiding this comment.
@qinhanmin2014 So set(a) will iterate on a and create a set with items of a.
In the original code we iterated over the first element of y_pred, which was a string, and so the result was set("eggs") == {'g', 'e', 's'}. The same thing was done on the right. It worked but I'm pretty sure that was not intentional.
Here, I changed the code a bit following your comment. On the left we iterate on y_pred that only has one element "egg" and on the right, we create the same set with one element.
|
Merged master in to fix conflicts. CI should be green. This should be quite easy to review. |
qinhanmin2014
left a comment
There was a problem hiding this comment.
I'll vote +1. It's slightly faster and won't hurt readability too much I think.
jnothman
left a comment
There was a problem hiding this comment.
Shrug. I find these harder to parse than the word set, but I'm okay with the change.
|
+2, merging |
This reverts commit 2ee7ede.
This reverts commit 2ee7ede.
Set literals (and set comprehension) were added in Python 2.7. They are a bit more readable and faster than alternative methods for
setcreation. This uses those when possible.