[MRG] MAINT Use set litterals when possible by rth · Pull Request #12667 · scikit-learn/scikit-learn

rth · 2018-11-24T20:42:21Z

Set literals (and set comprehension) were added in Python 2.7. They are a bit more readable and faster than alternative methods for set creation. This uses those when possible.

rth · 2018-11-24T20:44:35Z

sklearn/tests/test_multiclass.py

        assert_equal(set(clf.classes_), classes)
        y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
-        assert_equal(set(y_pred), set("eggs"))
+        assert_equal(set(y_pred), {"eggs"})


I have no idea how that passed -- it shouldn't have,

>>> set("eggs") {'g', 'e', 's'} >>> set(['eggs']) {'eggs'}

Curious to see this fix will fail..

Ok, the test was passing because we were creating a set(<string>) (and creating {'g', 'e', 's'}) on each side.

qinhanmin2014

+1 since it's faster, though I don't think it's more readable.

qinhanmin2014 · 2018-11-26T03:45:26Z

sklearn/tests/test_multiclass.py

        assert_equal(set(clf.classes_), classes)
        y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
-        assert_equal(set(y_pred), set("eggs"))
+        assert_equal(set([y_pred]), {"eggs"})


What's happening here? Why two different solutions for two sets? Maybe {y_pred}?

jnothman

Since the dict syntax is much more common, I tend to find set literals only easy to read if they are either very short ({1,2,3}) or if there is another visual hint that : is absent, such as putting in newlines between long set items or before for in comprehensions. But that's a personal taste.

I don't think the speed-up is sufficient justification alone

rth · 2018-11-27T08:50:42Z

Yeah, this one is a bit controversial. See related discussion at scipy scipy/scipy#9531

Not really sure why I started this -- stumbled on an issue about it at numpy while looking for something else...

OK, will revert set comprehensions that don't include a new line.

qinhanmin2014 · 2018-11-27T10:13:12Z

I'll vote +0 (maybe -1) to modify part of the our repo. Seems that readability is indeed a problem (not only for some python beginners like me), so maybe close this one?

rth · 2018-11-27T10:29:52Z

Well,

{"binary", "multiclass"}

is still the right way to define sets as opposed to,

set(["binary", "multiclass"])

Also, that's the mathematical notation for sets https://en.wikipedia.org/wiki/Set_(mathematics)#Describing_sets . Just the fact that dicts are more widespread doesn't change that. Using a notation close to the math notation in scientific code is good IMO. Set comprehension is just a generalization of that notation, again close to math notations https://en.wikipedia.org/wiki/Set_notation#Metaphor_in_denoting_sets

For the set comprehension, I kind of agree that it's less readable, but again it's just due to the fact that sets are little as compared to dicts. Dict is a generalization of sets to two elements in terms of notation, not the other way around.

Will revert set comprehension.

amueller · 2018-11-27T18:14:51Z

is still the right way to define sets as opposed to,

I don't think I understand that statement. For small sets I find set literals more readable and was surprised to find the set([...]) syntax in the code

rth · 2018-11-27T20:25:59Z

Removed set comprehension, and left only the most simple cases. CI is green (apart for the failing test on master).

rth · 2018-11-27T20:30:03Z

sklearn/tests/test_multiclass.py

-        y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
-        assert_equal(set(y_pred), set("eggs"))
+        y_pred = clf.predict(np.array([[0, 0, 4]]))
+        assert_equal(set(y_pred), {"eggs"})


@qinhanmin2014 So set(a) will iterate on a and create a set with items of a.

In the original code we iterated over the first element of y_pred, which was a string, and so the result was set("eggs") == {'g', 'e', 's'}. The same thing was done on the right. It worked but I'm pretty sure that was not intentional.

Here, I changed the code a bit following your comment. On the left we iterate on y_pred that only has one element "egg" and on the right, we create the same set with one element.

rth · 2019-01-03T14:31:19Z

Merged master in to fix conflicts. CI should be green.

This should be quite easy to review.

qinhanmin2014

I'll vote +1. It's slightly faster and won't hurt readability too much I think.

sklearn/tests/test_multiclass.py

jnothman

Shrug. I find these harder to parse than the word set, but I'm okay with the change.

qinhanmin2014 · 2019-01-06T11:21:32Z

+2, merging

This reverts commit 2ee7ede.

Use set litterals when possible

396b242

rth commented Nov 24, 2018

View reviewed changes

Fix failing test

a01da61

eamanu approved these changes Nov 25, 2018

View reviewed changes

rth changed the title ~~MAINT Use set litterals when possible~~ [MRG] MAINT Use set litterals when possible Nov 25, 2018

qinhanmin2014 approved these changes Nov 26, 2018

View reviewed changes

jnothman reviewed Nov 26, 2018

View reviewed changes

rth added 2 commits November 27, 2018 20:24

Revert set comrehension

1c82cff

More reverts

1b83e07

rth commented Nov 27, 2018

View reviewed changes

rth added 2 commits January 3, 2019 15:45

Merge branch 'master' into set-litteral

802a1df

Merge branch 'master' into set-litteral

10a9bbc

qinhanmin2014 approved these changes Jan 3, 2019

View reviewed changes

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

Hanmin's comment

9d99bd5

qinhanmin2014 reviewed Jan 3, 2019

View reviewed changes

sklearn/tests/test_multiclass.py Outdated Show resolved Hide resolved

More review comments

8cc5586

qinhanmin2014 approved these changes Jan 3, 2019

View reviewed changes

jnothman approved these changes Jan 6, 2019

View reviewed changes

qinhanmin2014 merged commit 684d8a2 into scikit-learn:master Jan 6, 2019

rth deleted the set-litteral branch January 6, 2019 17:23

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019

MAINT Use set litterals when possible (scikit-learn#12667)

1264247

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

MAINT Use set litterals when possible (scikit-learn#12667)

2ee7ede

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MAINT Use set litterals when possible (scikit-learn#12667)"

4b4a266

This reverts commit 2ee7ede.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MAINT Use set litterals when possible (scikit-learn#12667)"

9c2f15e

This reverts commit 2ee7ede.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

MAINT Use set litterals when possible (scikit-learn#12667)

80ae38f

Uh oh!

Conversation

rth commented Nov 24, 2018

Uh oh!

rth Nov 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rth Nov 24, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Nov 26, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

rth commented Nov 27, 2018

Uh oh!

qinhanmin2014 commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Nov 27, 2018

Uh oh!

amueller commented Nov 27, 2018

Uh oh!

rth commented Nov 27, 2018

Uh oh!

rth Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

rth commented Jan 3, 2019

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Jan 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rth Nov 24, 2018 •

edited

Loading

qinhanmin2014 commented Nov 27, 2018 •

edited

Loading